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In  this  dissertation,  we  present  four  algorithms  for  reconstructing  high-resolution 
images  in  PET.  The  first  algorithm,  referred  to  as  the  penalized  maximum  likelihood 
(PML)  algorithm,  iteratively  minimizes  a PML  objective  function.  At  each  iteration, 
the  PML  algorithm  generates  a function,  called  a surrogate  function,  that  satisfies 
certain  conditions.  The  next  iterate  is  defined  to  be  the  nonnegative  minimizer  of 
the  surrogate  function.  The  PML  algorithm  utilizes  standard  de-coupled  surrogate 
functions  for  the  maximum  likelihood  objective  function  of  the  data  and  de-coupled 
surrogate  functions  for  a certain  class  of  penalty  functions.  As  desired,  the  PML 
algorithm  guarantees  nonnegative  estimates  and  monotonically  decreases  the  PML 
objective  function  with  increasing  iterations.  For  the  case  where  the  PML  objective 
function  is  strictly  convex,  which  is  true  for  the  class  of  penalty  functions  under 
consideration,  the  PML  algorithm  has  been  shown  to  converge  to  the  minimizer  of 
the  PML  objective  function. 

The  drawback  of  the  PML  algorithm  is  that  it  converges  slowly.  Thus,  a “fast” 
version  of  the  PML  algorithm,  referred  to  as  the  accelerated  PML  (APML)  algorithm, 
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was  developed  where  an  additional  search  step,  called  a pattern  search  step,  is  per- 
formed alter  each  standard  PML  iteration.  In  the  pattern  search  step,  an  accelerated 
iterate,  which  has  lower  cost  than  the  standard  PML  iterate,  is  found  by  solving  a 
certain  constrained  optimization  problem  that  arises  at  each  pattern  search  step.  The 
APML  algorithm  retains  the  nice  properties  of  the  PML  algorithm. 

The  third  algorithm,  referred  to  as  the  quadratic  edge  preserving  (QEP)  algo- 
rithm, aims  to  preserve  edges  in  the  reconstructed  images  so  that  fine  details,  such 
as  small  tumors,  are  more  resolvable.  The  QEP  algorithm  is  based  on  an  iteration 
dependent,  de-coupled  penalty  function  that  introduces  smoothing  while  preserving 
edges.  The  penalty  function  was  developed  by  modifying  the  surrogate  functions  of 
the  penalty  function  for  the  PML  method. 

In  PET,  there  are  several  errors  that  have  the  net  effect  of  introducing  blur 
into  the  reconstructed  images.  We  propose  a method  that  aims  to  reduce  blur  in 
PET  images.  The  method  is  based  on  the  assumption  that  the  “true”  probability 
matrix  for  the  observed  emission  data  is  a product  of  an  unknown  nonnegative  matrix, 
called  a scatter  matrix,  and  a “conventional”  probability  matrix.  Under  the  suggested 
framework,  the  problem  is  to  jointly  estimate  the  scatter  matrix  and  emission  means. 
We  propose  an  alternating  minimization  algorithm  to  estimate  them  by  minimizing 
a certain  distance. 

The  algorithms  are  qualitatively  and  quantitatively  assessed  using  synthetic 
phantom  data  and  real  phantom  data. 
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CHAPTER  1 
BACKGROUND 

Medical  imaging  modalities,  such  as  X-ray  computed  tomography  and  magnetic 
resonance  imaging,  are  used  to  obtain  images  of  anatomical  structures  within  the 
human  body.  However,  in  certain  medical  applications,  it  is  also  important  to  get 
physiological  information.  The  reason  is  because  physiological  changes  can  indicate 
disease  states  earlier  than  anatomical  changes  [1],  Positron  emission  tomography 
(PET)  and  single-photon  emission  computed  tomography  (SPECT)  are  widely  used 
medical  imaging  modalities  that  acquire  physiological  information  on  both  human 
and  animal  subjects. 

In  SPECT,  physiological  information  is  obtained  by  imaging  the  distribution 
of  gamma-ray  or  X-ray  emitting  radio-isotopes  within  the  human  body  [1],  After 
the  radio-isotopes  are  introduced  into  the  human  body,  the  radio-isotopes  decay  and 
emit  gamma-ray  or  X-ray  single  photons.  A SPECT  scanner  detects  these  photons 
with  help  of  collimators  that  are  made  of  lead.  Since  a large  percentage  photons  are 
absorbed  by  the  lead  collimators,  the  sensitivity  and  accuracy  of  SPECT  is  limited. 

In  PET,  there  is  no  need  for  lead  collimators  because  collimation  is  performed  by 
electronic  circuits  that  are  connected  to  the  detectors.  Consequently,  PET  possesses 
relatively  high  sensitivity  and  accuracy,  when  compared  to  SPECT.  Although  cost 
is  the  major  limitation  of  PET,  recent  research  advances,  such  as  a less  expensive 
materials  for  detectors  and  scanner  configurations  that  need  a smaller  number  of 
detectors  than  conventional  scanners,  have  helped  decrease  its  cost  [1].  The  main 
advantage  of  SPECT  over  PET  is  its  substantially  lower  cost. 
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We  present  a brief  overview  of  PET.  PET  scanner,  and  certain  errors  in  PET 
in  Section  1.1,  1.2,  and  1.3,  respectively.  Then,  we  provide  some  background  on  the 
image  reconstruction  problem  in  PET. 

1.1  Overview  of  Positron  Emission  Tomography  (PET) 

Most  clinical  applications  of  PET  are  in  oncological  cases  involving  the  diagnosis 
and  staging  of  cancer,  treatment  planning  and  monitoring  of  tumors,  detection  of 
recurrent  tumors,  and  localization  of  biopsy  sites  in  cases  when  there  are  tumors  in 
the  head  or  neck  [2,3].  PET  is  also  used  in  the  diagnosis  of  coronary  artery  disease 
and  other  heart  diseases  [2,3]. 

In  PET,  physiological  information  is  acquired  by  imaging  the  distribution  of 
positron-emitting  isotopes,  such  as  13N,  150,  18F,  and  nC,  within  the  human  body  [3]. 
The  positron-emitting  isotopes  are  bound  to  compounds  with  known  biochemical 
properties.  Compounds  that  are  labeled  with  positron-emitting  isotopes  are  called 
radiopharmaceuticals.  The  choice  of  the  radiopharmaceutical  depends  on  its  appli- 
cation. For  example,  2-[18F]fluoro-2-deoxy-d-glucose  ([18F]FDG)  is  used  for  imaging 
brain  tumors  while  [13N]  ammonia  is  used  for  the  detection  of  coronary  artery  dis- 
ease [3].  After  the  radiopharmaceutical  is  introduced  into  the  subject  (injection  or 
inhalation),  positrons  are  emitted  as  the  positron-emitting  isotopes  decay.  An  emitted 
positron  annihilates  with  a nearby  electron  within  the  body  causing  the  generation 
of  two  high  energy  (511keV)  photons.  The  two  photons,  which  can  penetrate  the 
subject,  travel  in  almost  opposite  directions.  Ideally,  photon  pairs  that  are  generated 
by  a positron  annihilating  with  an  electron  will  be  detected  by  a pair  of  detectors. 

If  two  electronically  connected  detectors  detect  a pair  of  photons  within  a short 
time  interval  (e.g.,  < 10  ns),  then  the  detection  is  recorded  along  the  line  connecting 
the  two  detectors,  which  is  called  a line-of-response.  In  the  absence  of  an  error, 
the  detection  indicates  that  there  is  an  annihilation  somewhere  along  the  line-of- 
response.  The  detection  of  a photon  pair  is  referred  to  as  a coincidence.  In  addition, 
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the  timing  interval  of  a scanner  that  is  used  to  define  a coincidence  is  called  the 
scanner’s  coincidence  timing  window.  Since  there  are  many,  many  detector  pairs 
in  a PET  scanner,  sufficient  information  is  available  for  reconstructing  a map  of 
the  concentration  of  the  radiopharmaceutical.  For  each  detector  pair,  the  number 
of  coincidences  that  occur  during  the  scan  are  summed.  The  emission  data  is  the 
coincidence  sums  for  all  of  the  possible  detector  pairs. 

The  key  idea  in  PET  is  that  emission  data  depends  on  the  distribution  of  the 
radiopharmaceutical  within  the  subject  being  scanned,  which  in  turn  depends  on 
the  metabolism  of  the  subject.  Consequently,  numerous  researchers  have  developed 
algorithms  that  reconstruct  PET  images  whose  pixel  values  represent  the  distribution 
of  the  radiopharmaceutical,  and,  ultimately,  the  metabolism  of  the  subject. 

1.2  PET  Scanner 

Typical  PET  scanners  have  a diameter  of  80  - 100  cm  and  an  axial  extent  of 
10  — 20  cm  [4].  Figure  1 1 (a)  shows  a simplified  PET  scanner  and  Figures  1 1 (b)  and 
(c)  show  two-dimensional  views  of  the  scanner.  Usually,  a PET  scanner  consists  of 
hundreds  of  rectangular  bundles  of  crystals  that  are  formed  to  make  between  20  - 30 
rings  of  detectors,  where  each  detector  ring  contains  300-600  detectors.  Each  bundle 
of  crystals  is  connected  to  a few  (e.g.,  2 — 8)  photo-multiplier  tubes  (PMTs).  Figure 
1 1 (d)  shows  such  a block  of  crystals  coupled  to  four  PMTs.  When  a photon  interacts 
with  a crystal,  light  photons  are  emitted  and  the  PMTs  collect  the  photons.  From  the 
collected  light,  a PET  scanner  determines  the  crystal  within  which  the  scintillation 
occurs. 

State-of-the-art  PET  scanners  generally  provide  two  scan  modes:  slice-collimated 
mode  and  fully  three-dimensional  mode  [4],  In  slice-collimated  mode  or  septa-extended 
mode,  thin  tungsten  rings,  called  septa,  are  placed  between  the  detector  rings.  Figure 
1 2 (a)  illustrates  slice-collimated  mode  in  which  coincidences  are  collected  within 
a detector  ring.  In  fully  three-dimensional  mode,  the  septa  is  removed  from  the 
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(a) 


(b) 


Figure  1 1:  Simplified  PET  scanner:  (a)  a simplified  full-ring  PET  scanner  with  8 
detector  rings,  (b)  two-dimensional  view  of  the  scanner  at  x = 0,  (c)  two-dimensional 
view  of  the  scanner  at  y = 0,  and  (d)  a block  detector  consisting  of  an  array  of  8 x 8 
crystals  coupled  to  four  PMTs,  where  the  origin  of  the  rectangular  coordinate  system 
is  at  the  center  of  the  scanner. 


(a) 


(b) 


Figure  1 2:  Scan  modes:  (a)  septa-extended  mode  and  (b)  fully  three-dimensional 
mode.  The  dotted  lines  represent  line-of- responses. 


(a) 


(b) 


Figure  1-3:  Projection  definition  using  zig-zag  scan:  (a)  a set  of  detector  pairs  that 
define  a projection  and  (b)  another  projection.  A dashed  line  represents  a detector 
pair. 
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Figure  1 4:  Illustration  of  positron  range  and  a coincidence.  Two  concentric  circles 
denote  a detector  ring. 

scanner.  Figure  1 2 (b)  illustrates  fully  three-dimensional  mode  in  which  allowable 
line-of-responses  are  not  restricted  to  occur  within  a detector  ring. 

Within  a detector  ring,  detector  pairs  are  grouped  into  projections,  where  a 
projection  is  a set  of  detector  pairs  that  are  defined  by  a particular  “zig-zag  scan”. 
Figure  1 3 (a)  shows  a projection  and  the  defining  zig-zag  scan.  Figure  1 3 (b)  shows 
another  projection.  A sinogram  is  defined  to  be  a S x T matrix,  where  each  row  of  the 
matrix  contains  a projection,  S is  the  number  of  detector  pairs  within  the  projection, 
and  T is  the  number  of  projections  in  the  emission  data. 

1.3  Sources  of  Error 

After  a positron  is  emitted,  it  travels  a short  distance  before  it  annihilates  with 
a nearby  electron  within  a subject.  The  distance  between  the  locations  at  which  the 
emission  and  the  annihilation  take  place  is  called  positron  range.  The  positron  range 
is  proportional  to  the  reciprocal  of  the  density  of  the  material  an  emitted  positron 
travels  [5].  Figure  1 4 is  an  illustration  of  positron  range  for  a simplified  scanner  ge- 
ometry. The  positron  range  depends  on  the  energy  an  emitted  positron  deposits  (and, 
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Figure  1 5:  Illustration  of  non-collinearity  of  line-of-response.  The  dashed  line  indi- 
cates the  path  of  the  photon  pair  if  they  had  departed  in  exactly  opposite  directions. 
The  arrows  show  the  actual  photon  paths. 

consequently,  the  chosen  positron-emitting  isotope).  For  typical  positron-emitting 
isotopes,  a full  width  half  maximum  (FWHM)  of  the  distribution  of  positron  range 
is  a few  millimeters.  For  example,  the  FWHMs  of  the  distribution  of  the  positron 
range  for  18F  and  150  are  about  2 mm  and  8 mm,  respectively  [5,  p.  331]. 

Two  photons  generated  by  an  annihilation  usually  do  not  propagate  in  exactly 
opposite  directions.  This  phenomenon  is  called  non-collinearity  of  line-of-response. 
Figure  1 5 is  an  illustration  of  non-collinearity  of  line-of-response  for  a simplified 
scanner  geometry.  For  a fixed  direction  that  one  of  the  two  photons  propagates, 
we  refer  to  the  angle  between  the  opposite  direction  of  the  fixed  direction  and  the 
actual  direction  that  the  other  photon  travels  in  as  the  “angle  of  non-collinearity”. 
The  distribution  of  the  angle  of  non-collinearity  can  be  approximated  by  a Gaussian 
distribution  with  a FWHM  of  0.5  degree  [5]. 

The  angle  between  the  path  that  a photon  propagates  and  the  face  of  a detector 
when  the  photon  hits  the  detector  is  referred  to  as  the  incident  angle.  The  way  that 
a photon  interacts  with  a detector  depends  on  the  incident  angle.  If  the  path  that  a 
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(a) 


(b) 


Figure  1 6:  Single  and  scatter:  (a)  Illustration  of  attenuation  and  a single,  and 
(b)  Illustration  of  a scattered  coincidence.  The  arrow  penetrating  the  detector  ring 
denotes  that  the  photon  is  scattered  through  an  oblique  angle  such  that  it  does  not 
hit  a detector.  The  dotted  line  denotes  the  incorrectly  positioned  line-of-response. 

photon  travels  is  not  perpendicular  to  the  face  of  a detector  when  the  photon  hits  the 
detector,  then  it  is  possible  that  the  photon  does  not  interact  with  the  detector  that 
it  strikes.  The  photon  may  interact  with  another  detector  that  is  nearby  the  detector 
the  photon  hits  originally.  This  phenomenon  is  termed  by  detector  penetration.  Some 
methods  are  proposed  to  account  for  detector  penetration  [6,7]. 

When  a 511keV  photon  propagates  within  a subject  and  interacts  with  an  elec- 
tron, the  photon  may  undergo  a phenomenon  known  as  Compton  scattering.  When 
a photon  hits  an  electron,  the  photon  gives  part  of  its  energy  to  the  electron  and 
deflects  from  its  original  path  if  the  photon  has  sufficient  energy.  This  phenomenon 
is  called  Compton  scattering.  Compton  scattering  can  lead  to  three  kinds  of  error: 
attenuation,  scatter,  and  accidental  coincidence. 

Most  scattered  photons  are  scattered  out  of  the  scanner’s  field-of-view  so  that 
many  of  them  are  not  detected.  This  phenomenon  is  called  attenuation.  Figure  1 
6 (a)  illustrates  attenuation,  where  one  of  the  two  photons  of  an  annihilation  does 
not  hit  a detector  due  to  Compton  scattering.  Figure  1 6 (a)  also  illustrates  an 
event  called  a single,  where  only  one  of  the  emitted  photons  of  an  annihilation  is 
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(a)  (b) 


Figure  1 7:  Accidental  coincidence:  (a)  Illustration  of  an  accidental  coincidence  due 
to  two  annihilations  occurring  at  almost  the  same  time  and  (b)  Illustration  of  an 
accidental  coincidence  due  to  Compton  scattering. 

detected.  Note  that  attenuation  leads  to  an  incorrect  decrease  in  emission  counts.  In 
order  to  address  attenuation,  numerous  correction  methods  have  been  proposed  (see 
e.g.,  [8  10]). 

Consider  an  annihilation  event  where  Compton  scattering  occurs.  It  is  still  possi- 
ble that  a detector  pair  detects  both  photons  even  though  one  or  both  of  the  photons 
may  have  undergone  Compton  scattering.  Such  an  event  is  called  a scattered  co- 
incidence or  scatter,  which  is  illustrated  in  Figure  1 6 (b).  Note  that  a scattered 
coincidence  leads  an  incorrect  increase  in  emission  counts.  Since  a scattered  photon 
loses  part  of  its  energy,  the  energy  of  detected  photons  may  be  used  to  discriminate 
between  unscattered  photons  and  scattered  photons  [11,  pp.  65-69]. 

Two  photons  arising  from  different  annihilations  can  be  recorded  by  a detector 
pair.  This  event  is  called  an  accidental  coincidence.  Figure  1 7 (a)  illustrates  an 
accidental  coincidence  due  to  two  annihilations  occurring  at  almost  the  same  time. 
Sometimes,  an  accidental  coincidence  may  be  due  to  Compton  scattering  as  illustrated 
in  Figure  1 7 (b).  Like  scatter,  an  accidental  coincidence  leads  an  incorrect  increase 
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ill  emission  counts.  For  many  PET  scanners,  the  mean  accidental  coincidence  rate  is 
estimated  using  a “delayed”  timing  window  technique  [12]. 

The  efficiency  of  a detector  pair  is  defined  to  he  the  probability  that  a coincidence 
is  recorded  when  a photon  pair  hit  the  detectors.  Ideally,  this  probability  should  be 
one.  However,  the  efficiencies  of  detector  pairs  are  non-uniform  because  of  their 
geometric  differences  and  the  non-uniform  physical  characteristics  of  detectors.  The 
non-uniformity  of  detector  pairs  is  referred  to  as  detector  inefficiency.  To  address 
detector  inefficiency,  correction  methods  such  as  [13  15]  have  been  proposed. 

1.4  System  Model  for  Emission  Data 
In  1982,  Shepp  and  Vardi  proposed  a Poisson  model  for  PET  emission  data  and 
an  algorithm,  known  as  the  maximum  likelihood  expectation  maximization  (MLEM) 
algorithm,  for  reconstructing  maximum  likelihood  (ML)  emission  images  [16].  In  the 
Poisson  model,  the  region-of-interest  is  divided  into  J equal  sized  volume  elements, 
called  voxels.  Ultimately,  the  goal  is  to  estimate  the  mean  number  of  positrons  emit- 
ted from  each  voxel.  Let  the  ith  component  of  a vector  d.  di,  represents  the  observed 
number  of  photon  pairs  recorded  by  the  ith  detector  pair  and  the  jth  component  of  a 
vector  x,  Xj,  represents  the  unknown  mean  number  of  emissions  from  the  jth  voxel. 
Further,  let  I and  J denote  the  number  of  detector  pairs  and  number  of  voxels,  re- 
spectively. The  / x 1 vector  d and  the  J x 1 vector  x are  the  emission  data  and  the 
unknown  emission  mean  vector,  respectively.  Let  Vij  denotes  the  probability  that  an 
annihilation  in  the  jth  voxel  leads  to  a photon  pair  being  recorded  by  the  ith  detector 
pair.  The  I x J probability  matrix  V has  Vij  as  its  ( i,j)th  element.  In  the  Poisson 
model,  Shepp  and  Vardi  assumed  that  the  emission  data  d is  an  observation  of  a ran- 
dom vector  D with  elements  {Z?j}(=1  that  are  Poisson  distributed  and  independent. 
For  all  i , the  mean  of  the  random  variable  D j is 

j 

E{Di}  = ^ VjjXj. 

j= i 


(1.1) 
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In  practice,  the  probability  is  unknown  and  must  be  estimated  somehow.  The 
simplest  way  to  estimate  the  probability  matrix  V is  the  angle-of-view  (AOV)  method 
[16].  hi  the  AOV  method,  a detector  ring  and  a detector  are  modelled  by  a circle  and 
an  arc  on  the  circle,  respectively.  Moreover,  within  a voxel,  all  emitted  positrons  are 
assumed  to  be  emitted  from  the  center  of  the  voxel.  Figure  1 8 illustrates  a detector 
ring,  a detector  pair,  and  the  tube  (i.e.,  spatial  extent)  that  is  defined  by  the  detector 
pair.  In  the  figure,  the  AOV  from  the  point  gj  to  the  detector  pair  (y,  z ) is  also 
shown.  Specifically,  this  AOV  is  defined  to  be 


AOV  from  gj  to  ( y , z)  = 

I min {Zdygjby,  Z azg:bz , (tt  - /Laygjbz):  (n  - Zbygjaz)}, 

jo, 


o,  G tube  (y,  z) 

] (1.2) 

otherwise 


Said  another  way,  the  AOV  in  (1.2)  is  the  maximum  angle  for  which  a line  that  goes 
through  the  point  gj  will  simultaneously  intersect  both  detectors  of  the  detector  pair 
(y,  z).  In  the  AOV  method,  the  probability  V%j  is  defined  to  be 


AOV  from  gj  to  (y,  z) 

7T 


(1.3) 


where  the  detector  pair  (y,  z ) is  the  ith  detector  pair  and  the  point  gj  is  the  center 
point  of  the  jth  voxel.  In  the  AOV  method,  it  is  assumed  that  a photon  is  detected  by 
a detector  whenever  the  photon  hits  the  arc  corresponding  to  the  detector.  Clearly, 
the  AOV  method  does  not  account  for  detector  penetration  discussed  in  Section  1.3. 
Some  methods  have  been  developed  to  address  errors  due  to  detector  penetration  [6,7]. 

In  this  dissertation,  we  make  the  following  mild  assumptions  on  the  probability 
matrix  V and  the  emission  data  d. 

• (AS1)  V has  no  row  vector  of  zeros 

• (AS2)  dVj  ^ 0 for  all  j,  where  d is  the  transpose  of  d and  Vj  is  the  jth 


column  of  V. 


AOV  from  g,  to  (y,  z) 


Figure  1 8:  The  angle-of-view  from  the  point  gj  to  the  detector  pair  (y,  z)  is  shown. 
The  circle  denotes  a detector  ring.  The  arcs  ( ay,by ) and  (az,  bz)  represent  the  detec- 
tors y and  z,  respectively.  The  area  between  the  vertical  lines  (ay,  bz)  and  ( by , az) 
represents  the  tube  defined  by  the  detector  pair  {y,z). 
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To  see  the  implication  of  the  second  assumption,  consider  all  the  detector  pairs  where 
the  probability  of  recording  a photon  pair  generated  by  an  annihilation  in  the  jth 
voxel  is  non-zero.  Among  the  set  of  detector  pairs,  the  second  assumption  implies 
that  there  exists  at  least  one  detector  pair  with  non-zero  emission  counts.  The  ( AS2) 
is  expected  to  hold  whenever  the  duration  of  the  emission  scan  is  a reasonable  length 
of  time.  In  addition  to  (ASl)  and  (AS2),  we  assume  that  the  probability  matrix 
V accounts  for  errors  due  to  attenuation,  detector  inefficiency,  detector  penetration, 
positron  range,  non-collinearity  of  line-of-response,  and  scatter.  In  practice,  there 
are  correction  methods  for  attenuation  [9],  detector  inefficiency  [15],  and  detector 
penetration  [7].  However,  other  correction  methods  for  detector  penetration,  attenu- 
ation, and  detector  inefficiency  could  be  used  in  conjunction.  Note  that  in  Chapter 
6,  we  do  not  assume  that  the  probability  matrix  accounts  for  errors  due  to  scatter 
and  non-collinearity  of  line-of-response.  Instead,  we  present  a method  that  estimates 
the  probability  matrix  in  such  a way  that  errors  due  to  scatter  non-collinearity  of 
line-of-response  are  addressed. 

When  accidental  coincidences  (i.e.,  randoms)  are  considered,  the  Poisson  model 
must  be  modified  so  that  now  the  emission  data  d is  an  observation  of  a random 
vector  D that  is  Poisson  distributed  with  mean  (Vx  + p),  where  the  ith  component 
of  p,  pi , is  the  mean  number  of  accidental  coincidences  recorded  by  the  ith  detector 
pair,  i = 1,2 Usually,  it  is  assumed  that  the  mean  accidental  coincidence  rate 
p is  known.  In  practice,  the  mean  accidental  coincidence  rate  is  estimated  using  a 
“delayed”  time  window  technique  [12]. 

Given  the  emission  data  d , mean  accidental  coincidence  rate  p,  and  probability 
matrix  V,  the  problem  of  interest  is  to  estimate  the  mean  number  of  positrons  emitted 
from  each  voxel.  Since  it  is  assumed  that  the  data  are  independent,  it  follows  that 
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the  likelihood  function  for  emission  data  d is  given  by 

Pr{D  = d\x)  = n ^±£lte-\-Px+PI,  (1.4) 

7=1  dl- 

The  ML  estimate  of  x is  defined  to  be  the  maximizer  of  the  likelihood  function  over 
the  feasible  set.  Alternatively,  the  ML  estimate  of  emission  mean  vector  is  given  by 

xML  = argmax  T(a;)  , (1.5) 

x>0 

where  T(x)  = log  Pr{D  = d\x}  is  the  log  likelihood  function: 
i ii 

^ix)  = ^2  di  log {[Px  + p]i)  - YyPx\i  + _ l°g(^J)}  (1-6) 

i= 1 i=l  i= 1 

(note:  maximizing  the  likelihood  function  or  log  likelihood  function  are  equivalent 
operations). 

Although  the  ML  estimator  has  several  nice  theoretical  properties  [17,  ch.  7], 
images  produced  by  the  ML  method  (i.e. , the  MLEM  algorithm)  have  the  drawback 
that  they  are  extremely  noisy.  This  is  due  to  the  PET  image  reconstruction  problem 
is  ill-posed  because  of  the  facts  that  (1)  scan  times  for  data  acquisition  is  short,  (2) 
emission  data  contain  errors  due  to  attenuation,  scatters,  and  accidental  coincidences, 
and  (3)  the  data  obey  Poisson  statistics.  Currently,  the  most  popular  way  to  address 
the  ill-posed  nature  of  the  image  reconstruction  problem  is  through  the  use  of  penalty 
functions.  Numerous  penalized  maximum  likelihood  (PML)  methods  (also  known  as 
Bayesian  and  maximum  a posteriori  methods  (MAP))  have  been  proposed  [18  35]. 

In  the  MAP  method,  x is  assumed  to  be  an  observation  of  a random  variable  X 
with  known  distribution  and  the  a posteriori  distribution  (i.e.,  conditional  probability 
density  function  of  X given  D = d)  is  maximized.  After  some  manipulations  [17,  p. 
351],  the  MAP  estimate  is  found  to  be  the  maximizer  of  the  log  likelihood  function 
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plus  the  log  of  the  probability  density  function  of  X : 

xMAP  = argmax  T(a?)  + logfl(x)  , (1.7) 

x>0 

where  the  function  Q is  the  probability  density  function  of  X (i.e.,  prior  distribution). 
It  is  through  the  prior  distribution  f!  that  MAP  methods  have  the  ability  to  regularize 
the  image  reconstruction  problem. 

The  form  of  the  prior  distributions  commonly  used  in  PET  is  Q(x)  = 
where  the  function  A is  a scalar  valued  function  and  C and  f3  are  constants.  The 
constant  C > 0 is  chosen  such  that  the  area  under  the  distribution  fl  equals  one. 
The  constant  f3  > 0,  known  as  the  penalty  parameter,  controls  the  penalty  function’s 
degree  of  influence.  Since  it  is  known  that  PET  images  should  be  highly  correlated, 
the  penalty  function  A is  designed  in  such  a way  that  it  forces  the  estimates  of 
neighboring  voxels  to  be  similar  in  value.  Given  the  definition  of  G,  the  PML  or 
MAP  estimate  is  the  nonnegative  minimizer  of  the  PML  objective  function 

$(*)  = — 'L(cc)  + /3A(cc)  . 


(1.8) 


CHAPTER  2 
LITERATURE  REVIEW 

Although  the  MLEM  algorithm  by  Shepp  and  Vardi  [16]  produces  ML  estimates 
of  the  emission  means,  the  images  produced  by  the  ML  method  are  extremely  noisy 
due  to  the  fact  that  the  image  reconstruction  problem  in  PET  is  ill-posed.  As  dis- 
cussed in  Section  1.4,  the  reasons  are  that  (1)  scan  times  for  data  acquisition  is  short, 
(2)  emission  data  contain  errors  due  to  attenuation,  scatters,  and  accidental  coinci- 
dences, and  (3)  the  data  obey  Poisson  statistics.  One  way  to  obtain  PET  images  with 
sufficient  smoothness  is  to  terminate  the  MLEM  algorithm  before  the  log  likelihood 
function  is  maximized.  Of  course,  the  resulting  images  are  not  ML  images.  Another 
modification  of  the  MLEM  algorithm  is  to  first  obtain  an  ML  image  and  then  filter  it 
with  a low  pass  filter.  The  drawback  of  this  post-filtering  is  that  it  is  not  clear  how 
the  filter  is  chosen.  A variation  of  filtering  approach  just  described  is  to  filter  every 
MLEM  iterate,  which  was  suggested  by  Silverman  [36].  Silverman  did  not  provide  an 
answer  as  to  how  the  filter  should  be  chosen.  Denoising  emission  data  (i.e. , observed 
data)  is  another  way  to  regularize  the  PET  image  reconstruction  problem  [37,38]. 

Currently,  the  most  popular  way  to  introduce  regularization  is  through  the  use 
of  penalty  functions  that  force  the  estimates  of  neighboring  voxels  to  be  similar  in 
value.  The  basis  for  such  penalty  functions  is  that  PET  images  should  be  highly 
correlated.  The  first  part  of  the  chapter  is  devoted  to  so-called  penalized  maximum 
likelihood  (PML)  algorithms.  PML  algorithms  are  algorithms  that  minimize  PML 
objective  functions,  which  are  a sum  of  the  negative  log  likelihood  function  and  a 
penalty  function.  In  PET,  PML  and  maximum  a posteriori  (MAP)  are  terms  that 
are  used  for  methods  that  minimize  PML  objective  functions.  In  Section  2.1,  we 
briefly  review  existing  PML  algorithms. 
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Reconstructed  PML  images  are  blurred  by  errors  due  to  detector  penetration, 
positron  range,  non-collinearity,  and  scatter.  Errors  due  to  scatter  are  difficult  to 
correct  because  scatter  depends  on  the  activity  and  attenuation  within  the  subject 
and  the  scanner  design.  In  Section  2.2,  some  scatter  correction  methods  are  reviewed. 
The  regularized  image  reconstruction  algorithms  and  scatter  correction  algorithm  we 
propose  are  compared  with  the  existing  algorithms  in  Section  2.3. 

2.1  Penalized  Maximum  Likelihood  Algorithms 

In  1990,  Green  [23]  proposed  a PML  algorithm,  known  as  the  one-step-late  (OSL) 
algorithm.  The  algorithm  can  be  viewed  as  a fixed  point  iteration  that  is  derived  from 
the  Kuhn- Tucker  equations  [39,  pp.  36-49]  for  the  PML  optimization  problem.  Inci- 
dentally, Shepp  and  Vardi  showed  that  the  MLEM  algorithm  could  be  derived  from 
the  Kuhn- Tucker  conditions  in  a similar  way.  The  OSL  algorithm  is  straightforward 
to  implement,  but  nonnegative  estimates  cannot  be  guaranteed  and,  like  many  exist- 
ing algorithms,  convergence  is  an  open  issue.  Lange’s  goal  [24]  was  to  modify  the  OSL 
algorithm  in  such  a way  that  the  modified  algorithm  converges  to  the  PML  estimate. 
It  should  be  pointed  out  that  Lange’s  algorithm  requires  line  searches,  which  can  be 
computationally  expensive. 

Alenius  et  al.  [32]  suggested  a Gaussian  “type”  prior  that  depends  on  the  median 
of  voxels  within  local  neighborhoods,  and  introduced  an  algorithm  called  the  median- 
root-prior  (MRP)  algorithm.  The  MRP  algorithm  is  based  on  an  iteration  dependent 
objective  function.  Consequently,  it  really  cannot  be  considered  a PML  algorithm. 
Nevertheless,  the  MRP  algorithm  generates  “good”  images  in  the  sense  that  noise 
level  of  the  reconstructed  images  is  suppressed.  It  should  be  mentioned  that  a PML 
algorithm  was  derived  by  Hsiao  et  al.  [40]  that  resembles  the  MRP  algorithm  and 
performs  similar  to  the  MRP  algorithm.  The  PML  algorithm  by  Hsiao  et  al.  was 
derived  using  a prior  that  is  based  on  a certain  auxiliary  vector. 
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Levitan  and  Herman  [21]  proposed  a PML  algorithm  based  on  an  assumption 
that  the  prior  distribution  of  the  true  emission  means  was  a multivariate  Gaussian 
distribution.  The  assumption  led  to  a penalty  function  that  was  in  the  form  of  a 
weighted  least-squares  distance  between  x and  a reference  image.  However,  they  did 
not  indicate  how  the  reference  image  to  be  chosen. 

An  algorithm  was  proposed  by  Wu  [27]  using  a wavelet  decomposition  formu- 
lation. Specifically,  the  author  assumed  that  a vector  consisting  of  the  wavelet  co- 
efficients of  the  true  emission  means  is  a zero-mean  Gaussian  random  vector  with  a 
known  covariance  matrix.  From  this  assumption,  a prior  distribution  for  the  emission 
means  was  derived.  The  prior  distribution  is  a zero-mean  Gaussian  random  vector 
with  a covariance  matrix  that  depends  on  the  choice  for  the  wavelet  transform  and 
the  assumed  distribution  for  the  vector  of  wavelet  coefficients.  It  should  be  pointed 
out  that  the  assumption  was  not  clearly  justified  in  the  paper. 

Researchers  have  used  an  optimization  algorithm,  called  the  iterative  coordinate 
descent  (ICD)  algorithm  [41,  pp.  283-287],  to  obtain  estimates  for  various  penalty 
functions  [31],  [42].  Convergence  results  are  given  for  the  penalized  weighted  least- 
squares  method  [42]  and  both  algorithms  (i.e. , [31],  [42])  enforce  the  nonnegativity 
constraint.  Algorithms  based  on  ICD  algorithm  update  each  voxel  in  a serial  manner 
so  that  parallel  implementation  for  them  may  not  be  possible. 

De  Pierro  [25,30]  derives  PML  algorithms  that  minimize  certain  surrogate  func- 
tions that  he  constructs  by  exploiting  the  fact  that  the  log  likelihood  function  is 
concave  and  penalty  functions,  such  as  the  quadratic  penalty  function,  are  convex. 
Except  for  the  quadratic  penalty  function,  closed  form  expressions  for  the  minimizers 
of  the  surrogate  functions  do  not  exist.  Consequently,  an  optimization  method,  such 
as  Newton’s  method  [43,  pp.  201-202],  is  needed  to  minimize  the  surrogate  functions. 
De  Pierro  presents  some  convergence  results,  however  the  utility  of  his  methods  is 
unclear  because  no  experimental  results  were  provided.  It  should  be  noted  that,  in 
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the  transmission  tomography  paper  by  Erdo/jan  and  Fessler  [8],  a quadratic  surrogate 
function  was  used  for  a certain  class  of  penalty  functions.  The  quadratic  surrogate 
function  was  developed  by  Huber  [44,  pp.  184-186]. 

A fast  PML  method,  based  on  the  ordered  subset-EM  algorithm  [45],  was  pro- 
posed by  De  Pierro  and  Yamagishi  [33].  The  authors  show  that  if  the  sequence 
generated  by  the  algorithm  converges,  then  it  must  converge  to  the  true  PML  so- 
lution. Recently,  Ahn  and  Fessler  proposed  algorithms  [35]  that  are  based  on  the 
ordered  subset-EM  algorithm  [45],  an  algorithm  by  De  Pierro  and  Yamagishi  [33], 
and  an  algorithm  by  Erdogan  and  Fessler  [10].  Like  other  algorithms  based  on  the 
ordered  subset-EM  algorithm,  there  is  some  uncertainty  as  to  how  the  subsets  are  to 
be  chosen.  In  the  paper  by  Ahn  and  Fessler,  the  algorithms  are  said  to  converge  to  the 
nonnegative  minimizer  of  the  PML  objective  function  for  certain  penalty  functions 
and  their  accompanying  parameters  by  using  a relaxation  parameter  that  diminishes 
with  iterations.  Open  issues  are  how  the  relaxation  parameters  should  be  chosen  in 
practice  and  how  they  affect  the  performance  of  the  algorithms.  Convergence  rate 
varies  with  the  relaxation  parameter. 

2.2  Scatter  Correction  Methods 

Many  methods  have  been  proposed  to  correct  scattered  coincidences  [11,  ch. 
3].  They  can  be  classified  into  a few  categories:  (1)  energy  window  method,  (2) 
convolution/deconvolution  method,  and  (3)  calculating  scatter  distribution  method. 

One  of  the  scatter  correction  methods  is  based  on  the  use  of  multiple  (two  or 
more)  energy  windows  [46,47].  Recall  that  photons  lose  their  energy  when  they  have 
undergone  Compton  scattering.  The  principle  of  the  method  utilizing  energy  windows 
is  to  discard  detection  of  a photon  whenever  the  energy  of  the  photon  is  less  than 
511Kev.  Since  detectors  have  finite  energy  resolution,  there  is  a limitation  of  energy 
window  based  methods.  Consequently,  it  is  preferred  to  use  another  method  jointly 
with  the  energy  window  techniques. 
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Another  correction  method  for  scatter  is  the  convolution/deconvolution  method 
[48  51].  The  methods  in  [48,49]  assume  that  scattered  coincidences  (i.e. , scatter)  can 
be  approximated  by  a convolution  of  unscattered  coincidences  and  a certain  scatter 
function.  Under  the  assumption,  the  mean  scatter  in  the  observed  coincidences  is 
estimated  that  can  be  subtracted  from  emission  data  or  incorporated  in  the  system 
model  for  emission  data.  The  method  by  McKee  et  al.  assumes  that  the  distribution 
of  scattered  annihilations  can  be  approximated  by  the  convolution  of  the  distribution 
of  unscattered  annihilations  and  some  scatter  response  function  [50].  The  issue  of 
the  convolution/deconvolution  method  is  that  the  distribution  of  unscattered  coinci- 
dences (or  annihilations)  and  scatter  response  function  (or  scatter  function)  are  not 
known. 

Ollinger  introduced  a scatter  correction  method  that  calculates  scatter  distribu- 
tion using  an  analytical  equation,  transmission  images,  emission  images,  and  scanner 
geometry  [52],  Computational  cost  of  the  method  is  excessively  expensive,  thus  it 
might  be  hard  to  be  accepted  in  clinical  use  at  the  moment  of  writing. 

2.3  Summary  of  the  Proposed  Algorithms 
• In  Chapter  3,  we  present  an  algorithm  that  obtains  PML  estimates  for  a cer- 
tain class  of  edge-preserving  penalty  functions.  The  PML  algorithm  is  derived 
by  combining  the  convexity  idea  by  De  Pierro  [25, 30]  and  Huber’s  surrogate 
functions  [44],  Combining  two  existing  theories  in  such  a way  that  the  PML 
algorithm  is  convergent  is  new  to  PET  community.  In  theory,  the  algorithm 
guarantees  nonnegative  iterates,  monotonically  decreases  the  PML  objective 
function  with  increasing  iterations,  and  converges  to  the  solution  of  the  PML 
problem.  In  practice,  it  is  straightforward  to  implement  (i.e.,  no  additional 
hyperparameter  and  no  need  of  line  search)  and  it  can  incorporate  many  edge- 
preserving penalty  functions. 
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• In  Chapter  4,  we  develop  an  accelerated  version  of  the  PML  algorithm  by 
rising  the  pattern  search  idea  of  Hooke  and  Jeeve  [41,  pp.  287-291].  Using  this 
approach,  we  solve  a constrained  problem  at  each  pattern  search  step  that  leads 
to  improved  convergence  rates.  A modification  of  Hooke  and  Jeeve’s  direction 
vector  is  also  introduced  that  improves  performance.  It  should  be  mentioned 
that  Hooke  and  Jeeve’s  method  has  not  been  used  in  PET  image  reconstruction. 
The  proposed  algorithm  inherits  the  nice  properties  of  the  PML  algorithm  and 
converges  to  the  minimizer  of  the  PML  objective  function.  In  experiments,  the 
accelerated  algorithm  needed  less  than  about  one  third  of  the  CPU-time  that 
was  necessary  for  the  PML  algorithm  to  converge. 

• In  Chapter  5,  we  propose  a regularized  image  reconstruction  algorithm,  referred 
to  as  the  quadratic  edge  preserving  (QEP)  algorithm,  that  aims  to  preserve 
edges  through  the  use  of  certain  newly  developed  de-coupled  penalty  functions 
that  depend  on  the  current  estimate.  The  QEP  algorithm  was  motivated  by  the 
analysis  of  the  PML  algorithm.  The  algorithm  by  Alenius  et  al.  [32]  also  uses 
an  iteration  dependent  objective  function.  However,  it  should  be  mentioned 
that  the  algorithm  uses  the  OSL  algorithm  to  generate  the  next  iterate.  The 
drawback  of  Alenius  approach  is  that  the  OSL  algorithm  does  not  guarantee 
convergence. 

• In  Chapter  6,  we  propose  a model  for  emission  data  where  an  unknown  matrix, 
called  a scatter  matrix,  is  introduced.  The  model  aims  to  account  for  errors 
due  to  scatter  and  non-collinearity.  Based  on  the  model,  a certain  minimization 
problem  is  constructed  that  allows  for  the  scatter  matrix  and  emission  mean 
vector  to  be  jointly  estimated.  Since  the  minimization  problem  is  impossible  to 
solve,  we  propose  an  algorithm  that  greatly  reduces  the  number  of  unknowns 
in  the  scatter  matrix  and  alternately  estimates  the  scatter  matrix  and  emission 
mean  vector.  It  should  be  mentioned  that  Mumcuoglu  et  al.  [53]  used  the  same 
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model.  However,  they  assumed  that  the  scatter  matrix  is  known  and  accounts 
lor  errors  due  to  detector  penetration  as  well.  Their  scatter  matrix  was  obtained 
through  Monte-Carlo  simulations. 


CHAPTER  3 

PENALIZED  MAXIMUM  LIKELIHOOD  ALGORITHM 
Although  the  ML  estimates  of  the  emission  means  are  available  by  using  the 
MLEM  algorithm  [16],  as  discussed  in  Section  1.4,  the  resulting  PET  images  are 
extremely  noisy  due  to  the  fact  that  the  PET  image  reconstruction  problem  is  ill- 
posed.  This  is  because  of  short  scan  times,  errors  in  the  emission  data,  and  the 
fact  that  the  data  obey  Poisson  statistics.  The  most  popular  way  to  address  the  ill- 
posed  nature  of  the  PET  image  reconstruction  problem  is  through  the  use  of  penalty 
functions.  Penalty  functions  used  in  PET  are  designed  in  such  a way  that  estimates 
lor  the  emission  means  of  neighboring  voxels  are  forced  to  be  similar  in  value,  unless 
there  is  an  “edge”  within  neighborhood.  By  an  edge,  we  mean  that  there  is  a group 
of  connected  voxels  that  have  significantly  greater  activity  than  the  other  voxels  in 
neighborhood.  For  example,  suppose  there  is  only  one  voxel  with  significantly  greater 
activity  than  the  other  voxels  in  its  neighborhood.  Then,  we  would  say  that  there  is 
no  edge  within  the  neighborhood.  Simply  stated,  penalty  functions  provide  a means 
for  reconstructing  PET  images  that  have  considerably  less  noise  than  MLEM  images, 
yet  retain  edges  (e.g.,  tumors)  which  may  convey  important  information. 

In  Section  3.1,  we  first  derive  an  algorithm,  called  the  penalized  maximum  like- 
lihood (PML)  algorithm,  that  incorporates  a wide  class  of  edge-preserving  penalty 
functions.  Then,  we  prove  that  the  PML  algorithm  converges  in  Section  3.2.  Finally, 
we  summarize  the  properties  of  the  PML  algorithm  in  Section  3.3.  It  should  be  men- 
tioned that  we  presented  the  PML  algorithm  in  [18]  without  a proof  of  convergence. 
Our  proof  of  convergence  can  now  be  found  in  a recent  manuscript  [19]. 
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Figure  3 1:  One-dimensional  illustration  of  the  optimization  transfer  method.  At 
each  iteration,  a surrogate  function  is  obtained  and  a minimizer  of  the  surrogate 
function  is  defined  as  the  next  iterate.  Ideally,  it  is  “easy”  to  get  the  minimizer  of 
the  surrogate  function. 

3.1  Penalized  Maximum  Likelihood  (PML)  Algorithm 

The  problem  of  interest  is  to  determine  the  nonnegative  minimizer  of  the  PML 
objective  function  $(*)  = -<i>(x)+f3A(x)  (T  is  defined  in  (1.6)),  where  A is  a penalty 
function  that  forces  emission  mean  estimates  of  neighboring  voxels  to  be  similar  in 
value.  In  other  words,  we  want  to  solve  the  following  optimization  problem: 

(P)  2! PML  = arg  min  $(a:)  . 

x>0 

The  penalty  functions  we  consider  are  of  the  form 

j 

Mx)  = 51  Y!  uik9(?j,xk)  , (3.1) 

3= 1 keNj 

where  Nj  is  a set  of  voxels  in  a neighborhood  of  the  jth  voxel,  the  constants  {a ijk}  are 
positive  weights  for  which  ujk  = wkj  for  all  j and  k,  and  g(s,  t ) = A(s  - t)  whereby 
the  function  A satisfies  the  following  assumptions: 

• (AS3)  A(i)  is  symmetric 

• (AS4)  A (t)  is  everywhere  differentiable 
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• (AS5)  A (t)  — jt\ (t)  is  increasing  for  all  t (assumption  implies  that  A is  strictly 
convex) 

• (AS6)  7 (t)  = is  nonincreasing  for  t > 0 

• (AS7)  7(0)  = liin7(f)  is  finite  and  nonzero 

• (AS8)  A (t)  is  bounded  below  (assumption  implies  that  A(x)  is  bounded  below). 
Examples  of  functions  that  satisfy  (AS3)-(AS8)  are  the  quadratic  function  A(f)  = t2 
and  Green’s  log-cosh  function  A(f)  = log(cosh(f))  [23].  Regarding  the  neighborhood 
Nj,  the  jth  voxel  is  excluded  from  the  set  Nj  and,  if  the  kth  voxel  is  in  Nj,  then  the 
j,h  voxel  is  in  N^.  A common  choice  for  Nj  is  the  eight  nearest  neighbors  of  the  jth 
voxel. 

Since  it  is  not  possible  to  get  a closed-form  solution  to  the  minimization  problem 
(P),  iterative  optimization  methods  are  necessary.  The  PML  algorithm  we  propose  is 
based  on  the  optimization  transfer  method  [10,25,30,34,54]  where,  at  each  iteration, 
a function  that  satisfies  certain  conditions  is  obtained  and  the  next  iterate  is  defined 
to  be  a minimizer  of  the  function.  The  function  found  at  each  iteration  is  referred 
to  as  a surrogate  function  for  the  function  to  be  minimized.  This  idea  is  illustrated 
with  the  one-dimensional  example  in  Figure  3 1.  In  the  figure,  the  problem  is  to 
find  the  minimizer  of  the  function  /,  which  is  t*.  It  is  assumed  that  a closed-form 
solution  is  not  available  to  the  minimization  problem.  Given  an  initial  guess  A°\  a 
surrogate  function  /(°>  that  depends  on  <(0)  is  determined.  Then,  the  next  iterate 

is  generated  by  finding  the  minimizer  of  /i°b  To  get  the  following  iterate  t^2\  a 
surrogate  function  that  depends  on  is  obtained  and  then  minimized.  These 
steps  are  repeated  until  some  convergence  criterion  is  met. 

For  a vector  argument  t,  a surrogate  function  /(n)  satisfies  the  following  condi- 
tions: 

• (Cl)  f^n\t)  > f(t)  for  all  t € {domain  of  /} 

• (C2)  /G)(tG))  = /(£(«)) 
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• (C3)  V/(n)(<("))  = V/(£(n)), 

where  t1"'1  is  the  nth  iterate,  V denotes  the  gradient  of  a function,  and  the  superscript 
(n)  indicates  that  the  functions  {f^n)}  and  the  iterates  {£(n)}  depend  on  the  iteration 
number.  The  next  iterate  £(n+1)  is  defined  to  Ire  a minimizer  of  /(n): 

t("  1 = argrmn  f^n\t)  subject  to  £ € {domain  of  /}  . (3.2) 

Defining  the  iterates  in  this  way  insures  that  the  objective  function  / decreases 
monotonically  with  increasing  iterations.  To  prove  this  fact,  we  first  note  that 
/(*(n+1))  < /(n)(£(n+1))  by  (Cl).  Since  < /(")(£(">)  by  (3.2),  it  follows  by 

(C2)  that  for  all  n, 

/(£(n+1))  < /(«)(£(«+!))  < /(n) (£<”))  = /(£("))  . (3.3) 

It  should  be  mentioned  that  (C3)  is  not  necessary  for  the  monotonicity  in  (3.3). 
However,  (C3)  is  often  needed  in  order  to  prove  an  algorithm  that  utilizes  the  op- 
timization transfer  method  converges  (see  [18,25,30]).  Although  the  optimization 
transfer  method  is  straightforward  in  principle,  the  difficulty  in  practice  is  that  it 
may  be  difficult  to  find  surrogate  functions  that  satisfy  the  conditions  (Cl),  (C2), 
and  (C3). 

De  Pierro  developed  surrogate  functions  for  the  negative  log  likelihood  function 
by  using  the  convexity  of  the  negative  log  function  [25,30].  His  idea  is  based  on 
the  following  property  of  convex  functions  [55,  pp.  860-861]:  For  a convex  function 
/, 

f ( ajtj) — aj  / (tj)  (3-4) 

j j 

where  Yhjajtj  € {domain  of  /},  tj  € {domain  of /}  for  all  j,  aj  > 0 for  all  j, 
'Yjjdj  = 1,  and  {domain  of  /}  is  a convex  set.  Specifically,  De  Pierro  utilized  the 
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following  inequality  in  ML  estimation  where  /(f)  = - log(f): 

’Pijx'j"  pPx<”>] 


/(MO  = / £ 


[PsWjj  x(n) 


< y VijX^}  f(^x{n)^x  \ 


(3.5) 

(3.6) 


where  x<")  is  the  nth  iterate  of  x.  Vi:  > 0,  x 'Jn)  > 0,  and  [Px^]t  ± 0 for  all  i,  j,  and 
n.  Let 


/,(*)  = f([Px)  j) 

/7'w  * 


(3.7) 

(3.8) 


Then,  it  is  straightforward  to  see  that,  for  all  i,  (1)  fln\x)  > fi(x)  for  all  x > 0, 
(2)  /|n)(xW)  = ^(xW),  and  (3)  V/(n)(xW)  = V/^W).  Thus,  for  all  i,  f\n)  is  a 
surrogate  function  for  /;. 

Although  De  Pierro  developed  the  surrogate  functions  for  the  log  likelihood  func- 
tion under  the  assumption  of  no  accidental  coincidences  (i.e.,  p = 0),  the  surrogate 
functions  can  be  easily  modified  to  account  for  accidental  coincidences.  Observe  that 
[Px  + p\i  can  be  written  as  a convex  combination 


\vx+p}- - y VijX^]  \Vx(n) + d 
[Px(n)  + ph  y 


Xj  + 


pi 


[V  a?(n)  + p], 


[PxW  + p],.  (3.9) 


Using  the  convexity  of  the  negative  log  function  and  the  fact  that 

J v .J") 

, Pi  1 

[px(n)  pj . + p]j  ’ 

we  have  the  following  inequality  for  /(f)  = — log(f): 

V,iX{r]  ( [Px(n)  + p\i 


(3.10) 


/(P-H.)  < 


„(n) 


— ( [PxW  -I-  p\i 
[Va :M  + p\if(jrxin)  + p]l)  • 


(3.11) 
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Given  the  inequality  in  (3.11),  the  surrogate  function  at  the  nth  iteration  for  the 
negative  log  likelihood  function  —'I'  can  be  expressed  as 


V\jxj 


(n) 


-¥n\x)  = V { [Px]i  - V di ... 

^ I J y [TUcH  + p]t 


l°g(^j)  } + c[n) , 


(3.12) 


i=l 


where 


Cin)  = {p»+1°g(di!)-daog([iPa:(w)  + pji)+y  di { log(af)}  . (3.13) 

L l Vx(n>\i  + pi  J 

It  is  straightforward  to  show  that  the  surrogate  function  \lAn)  satisfies  (Cl),  (C2), 
and  (C3):  (1)  \k(n)(a:)  < 'F(x)  for  all  x > 0,  (2)  vp(Tl)(x("))  = ^/(aA")^  and  (3) 
W(n)(®(n))  = V'k(a:(n)). 

Since  surrogate  functions  for  the  negative  log  likelihood  function  — are  avail- 
able, we  only  need  to  find  surrogate  functions  for  the  penalty  function  A (a?)  = 
Sj  12 k — xk).  Under  assumptions  (AS3)-(AS8),  Huber  developed  a sur- 

rogate function  for  A in  [44]  (see  also  [8]).  Given  an  arbitrary  point  t^n\  Huber’s 
surrogate  function  for  A,  which  is  defined  by 


A {n\t)  = A y>)  + A {t(n)){t  - i<">)  + - f(n))2  , (3.14) 


has  the  property  that  A(n)(t)  > A (t)  for  all  t (see  Appendix  A),  A(n)(t(n))  = A (t(n)), 

- (n)  . 

and  A (vn’)  = A(U’h),  where  the  dot  over  a function  represents  its  first  derivative. 
For  — x^\  it  follows  that  a surrogate  function  for  A is 


aw(*)  = E E , 

j= 1 keNj 


(3.15) 


where  ^(s^)  = A(”)(s  _ ^ 

Using  A^n)  as  a starting  point,  we  will  now  construct  a surrogate  function  for  A 
that  has  a more  convenient  form.  By  the  convexity  of  the  square  function,  we  have 
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the  following  inequality 


It  should  be  mentioned  that  De  Pierro  [30]  and  Hsiao  et  al.  [34]  utilized  this  property 
of  the  quadratic  function  in  the  PET,  and  Erdo^an  and  Fessler  [10]  applied  it  to  a 
non-quadratic  convex  function  for  transmission  tomography.  Motivated  by  (3.16),  we 
define 


9{n)(x„xk)  = \{x™-x™)  + \(x 


(n) 


„(«K 


( n ) 


X 


(n) 


) (XJ 


X 


(n) 


) - (xk 


X 


M) 


+ -Mxf]  - x[n))  [(2 xj  - 2x[Jl)f  + (2xk  - 2x[n)) 


(3.17) 


By  construction,  the  following  statements  are  clear:  (1)  g(n\xj,xk)  > g(xj,xk)  for 
all  xj  and  xk  from  (3.16),  (2)  g(n)(x{^  x[n))  = g{xf\  x^),  (3)  £- g(n)  {x^ , x[n) ) = 
£~g(x<j  \xk  *)>  and  (4)  -^g(u){x *n),x[n))  = ^g{XjU\x [c*).  The  difference  between 
9^  and  g ^ is  that  g is  de-coupled  in  the  sense  that  it  does  not  have  terms  of  the 


form  XjXk.  This  difference  is  important  because,  as  we  show  later,  using  g ^ enables 
us  to  construct  surrogate  functions  for  (E>  that  have  closed  form  expressions  for  their 
minimizers.  Using  g(n\  an  alternative  surrogate  function  for  A is 

j 

A(n)(x)  = ujkg(n\xjiXk)  ■ (3.18) 

j= 1 keNj 


It  is  clear  that,  by  construction,  A^  satisfies  the  following  properties:  (1)  A^n^(cc)  > 
A(®)  for  all  x,  (2)  A(n>(®(n))  = A(a:(n)),  and  (3)  VA(n)(®(n))  = VA(cc('l)). 

Now,  using  'P(n)  and  A(n),  the  desired  surrogate  function  at  the  nth  iteration  for 

4>  is 


${n)(*)  = -T(n)(®)  +/?A(n)(aO  . 


(3.19) 
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From  the  properties  of  and  A(n),  it  follows  that  the  surrogate  function  <f>(n) 
possesses  the  requisite  properties: 

• (PI)  for  all  x > 0 

• (P2)  = $(*(u)) 

• (P3)  V<J>(n)(a;(”))  = V$(*(n)). 

Given  x^n\  the  next  iterate  a;(n+1)  is  found  by  minimizing  <h(n): 


Defining  the  iterates  in  this  way  insures  that  the  objective  function  $ decreases  with 
increasing  iterations  as  shown  in  (3.3): 


All  that  remains  now  is  to  solve  the  optimization  problem  in  (3.20).  To  this  end, 
we  write  AG)(aj)  as 


cc('l+1)  A arg  mjn  $(n)(a;)  . 


x>0 


(3.20) 


$(a:(n))  > $(*(n+1))  for  all  n. 


(3.21) 


j 


(3.22) 


3=1  k&Nj 


4n)x*5n)  - 4n))  (3.25) 


(3.24) 


(3.23) 


where 
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(see  proof  in  Appendix  B).  Since  tA")  and  are  de-coupled,  <f4ra)  can  be  written 


clS 


4>(n)(a;)  = -¥n\x)  + /3A{n\x) 

1 ( 1 v 

= E ‘ 1 


(3.26) 


where 


i= 1 


“ ' [VxW  + p]i 


log  (Xj) 


(n) 


+2p  T,T.h'p’^i)+c++pc^ 

j= 1 keNj 

J (I  I sp  (n) 

E{^Epu-iog(xj)|:d.I^-] 


(3.27) 


-(«)  i ar<(.n) 


E 2/3  E W(4"’  - 4"’)  (4  - 2 m<gXj  + 


+ cr  + pc: 


j= i I teiv 


= + f3h)xj  + + ci 

3=1 


i(n) 

3 i 


(3.28) 

(3.29) 


Ejn,  = -Ed 


VijXj 


(n) 


~(  1 [PxW  + p\i 


F+  & 2/3  £ w( 

feeWj 

/ 


A")  _ „(")'! 

x/t  ; 


Gin)  = E 51  wfr?0  - 


8=1 


fceATj 


c(")  = ^(n)  + £<-(")  + 2q  EE  ^3kl{xf)  ~ x[n))m 

j= 1 fcGWj 


‘3k 


(3.30) 

(3.31) 

(3.32) 

(3.33) 


Since  $1”)  is  de-coupled,  as  seen  from  (3.29),  it  follows  that  the  solution  to  (3.20)  is 
given  by 


x 


(n+l) 


= arg min  ^(Xj)  , j = l,2,...,J. 


xj>  0 


where 


= E(;]  log  (t)  + Fjn)t2  + Gf]t  . 


(3.34) 


(3.35) 
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Fortunately,  the  function  (f)^  is  strictly  convex  for  all  j and  n under  the  assump- 
tion that  Xj  > 0 for  all  j and  n.  We  will  prove  this  statement  by  showing  that  the 
second  derivative  of  ^ is  positive,  when  > 0 for  all  j and  n.  First,  note  that 
Ejn)  is  negative  and  p'f'  is  positive  for  all  j and  n.  The  fact  that  E(-l)  > 0 is  due  to 
the  fact  that  dVj^O  (see  (AS2)  in  Section  1.4)  and  the  assumption  that  r^71'1  > 0. 
The  fact  that  Fjn)  > 0 follows  from  the  positivity  of  the  function  7,  weights  {ujjk}, 
and  penalty  parameter  /3.  To  see  why  7 (t)  > 0 for  -00  < t < 00,  recall  that  A (t)  is 
a symmetric,  strictly  convex  function  by  (AS3)  and  (AS5).  It  follows  that  A (t)  > 0 
over  (0,  00)  and  A (t)  < 0 over  (— 00, 0).  Using  the  fact  that  7(0)  is  finite  and  nonzero 
(see  (AS7)),  we  have  that  7 (t)  > 0 for  —00  < t < 00.  Now,  consider  the  second 
derivative  of  <^n).  Easy  calculations  show  that  <p^\t)  — (~E^n)/t2  + 2Fjn)),  where 
the  double  dot  over  a function  represents  the  second  derivative  of  a function.  Since 
Fj  ^ > 0 and  EIJ"'1  < 0,  it  follows  that  the  second  derivative  of  <pjH\t)  is  positive  for 
all  j and  n.  Thus,  is  strictly  convex  for  all  j,  n,  and  t > 0,  and,  from  (3.29), 

T(n)  is  strictly  convex  over  the  set  {x  : x > 0}. 

Since  is  de-coupled,  is  strictly  convex,  and  ) — > 00  as  t — > 0+,  it  is 
true  that 

cc(n+1)  > 0 and  <^n)(xJn+1))  = q . (3.36) 


Note  that  (3.36)  satisfies  our  assumption  that  > 0 for  all  j and  n.  To  solve  (3.34), 
we  compute  the  first  derivative  of  <^n)  and  set  it  to  zero.  Since  E^n)  < 0 and  Fjn)  > 0, 
the  root  of  the  quadratic  equation  that  preserves  the  nonnegativity  constraint  is 

-G[ 


(n+l)  _ 


?r> + ji 


G(n)2  _ 8EWp 


-(n)  7-r(n) 


4 F 


(n) 


, j = 1,2,  ...,J  . 


(3.37) 


Observe,  as  f3  — > 0,  (3.37)  approaches  —E^/  Pij  by  L’Hospital’s  rule.  Thus,  the 
iteration  in  (3.37)  is  equivalent  to  the  MLEM  algorithm  when  (3  = 0 . 
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In  summary,  given  a strictly  positive  initial  estimate  a:(0)  > 0,  the  steps  of  the 
PML  algorithm  are:  for  n = 0, 1,  2, . . . 

• Step  1 Let  cc(l))  > 0 be  the  initial  estimate 

• Step  2 Construct  the  surrogate  function  4>(n)  from  the  current  iterate  x(n)  using 
(3.29),  (3.30),  (3.31),  and  (3.32) 

• Step  3 Get  xn+l  using  (3.37) 

• Step  4 Iterate  between  Steps  2 and  3 until  some  chosen  stopping  criterion  is 
met. 


3.2  Convergence  Proof 

Using  (P1)-(P3)  and  (AS1)-(AS8),  we  now  prove  that  the  PML  algorithm 
converges.  The  following  convergence  proof  is  based  on  the  convergence  proof  by 
Lange  and  Carson  [56]  (see  also  [30])  of  the  MLEM  algorithm  by  Sliepp  and  Vardi  [16]. 

By  (3.21),  the  PML  algorithm  has  the  property  that  it  decreases  the  objective 
function  $ with  increasing  iterations: 

• (P4)  $(a;(n+1>)  < $(*("))  for  all  n > 0. 

Another  property  of  the  algorithm  is  that 

• (P5)  the  sequence  {4>(a:(n))}  is  convergent. 

This  property  follows  from  (P4)  and  the  fact  that  4>  is  bounded  below  by  (AS8) 
(see  [57,  Theorem  1.4,  p.  6]). 

Proposition  1 The  sequence  {a^")}  is  bounded. 

Proof : From  (P4),  it  follows  that  <P(x(n))  < <P(x(0))  for  all  n > 0.  Consider 
the  set  B = {x  > 0 : 4>(cc)  < 4>(cc(0))}.  Then,  clearly  {a:^}  C ®.  So,  to  prove  {x^n)} 
is  bounded,  we  will  prove  that  the  set  B is  bounded.  It  is  straightforward  to  see  that 
B is  bounded  below  by  0.  Now,  suppose  that  B is  not  bounded  above.  Then,  there 
exists  a point  z e B such  that  ||z||  oo.  This  result  means  that  for  some  j there 
exists  Zj  such  that  zj  oo.  Since  (AS2)  implies  that  V has  no  column  vector  of 
zeros,  it  follows  that  [Pz\i  ^ oo  for  some  i and  $(z)  oo.  This  implies  that  z is  not 
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an  element  of  the  set  IB  because  all  the  elements  ol  B have  an  objective  value  that  is 
less  than  or  equal  to  < oo,  which  is  a contradiction.  Therefore,  the  set  B is 

bounded  above.  ■ 

Proposition  2 There  exists  some  constant  cj  > 0 such  that  $(x("))  — <f>(x(n+1))  > 
Cj  ||x(n)  — x(n+1)||2  for  all  n > 0. 

Proof  : By  (PI),  (P2),  and  (3.29),  we  have  the  following  inequality 

j 


J 

<f>(x(n))  - $(x(n+1))  > $<n>(x(n))  - 4>(n)(x("+1))  = ]T  {^n)(^n)}  - ^n)(xJ"+1))|  . 

3 = 1 

(3.38) 

Suppose,  for  each  n,  the  function  4>^\t)  is  expanded  into  a second-order  Taylor 
series  [55,  pp.  868-869]  about  the  point  x j”+1)  and  evaluated  at  t = xjn).  Then,  the 
right  hand  side  of  (3.38)  can  be  written  as 

$(»)(*("))-$(")(*("+!))  = J2{  (4n)  -^n+1))^n)  (4”+1))+^  (^n) -xf+l))24>f]  (xf+1))  } , 

3= 1 2 

(3.39) 

where  the  double  dot  over  a function  represents  its  second  derivative  and  a -1 ' 1 1 is  a 
point  between  x{-l)  and  x{J1'  u.  Since  <^n)(xjn+1))  = 0 by  (3.36),  it  follows 

J 1 

$W(x(n))  - <f>(n)(x(n+1))  = J]  - xf+x)Y'<j>f\xf+l))  . (3.40) 

3= 1 

Now,  recall  that  $n)(t)  = ( -E$n)/t 2 + 2 Fjn))  with  F$n)  = 2(3  ZkeNj  ^(xf  - x<n)) 
and  < 0.  Since  {x^}  is  bounded  and  7 (t)  > 0 is  a continuous  function  for 
-00  < t < 00,  there  exists  a number  70  > 0 such  that  7(xjn)  - x[n))  > 70  for  all 
j , k,  and  n.  Hence,  F jn^  > cj  for  all  j and  n,  where  Ci  = 2(3^q  min  Ylk<=N  > 0- 
Therefore,  <j)^\xj)  > 2ci  for  all  j and  n,  and  we  obtain  the  desired  result 


4>(x(n))  - $(x(n+1))  > ci  ||x(n)  - x(n+1)||2  . 


(3.41) 
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From  (P5)  and  Proposition  2 , it  follows  that 

• (P6)  the  sequence  {ad'd  — ad"+d}  converges  to  0. 

The  following  proposition  will  be  used  later  to  prove  not  only  that  a limit  point 
of  the  sequence  {cc( ,l) } satisfies  one  of  the  Kuhn-Tucker  conditions  [55,  p.  777]  but 
also  that  the  sequence  {x1'1'1 } has  a finite  number  of  limit  points. 

Proposition  3 Let  x*  be  a limit  point  of  the  sequence  jad'd}.  Then,  for  all  j such 


that  x*  ^ 0, 


^f{x] 


= 0 


x=x * 


(3-42) 


Proof : By  Proposition  1 , there  is  a subsequence  jad”d}  that  converges  to  x* 
(see  the  Bolzano- Weierstrass  theorem  in  [58,  p.  108]).  By  (P6),  the  subsequence 
{ajlni+i)}  aiso  converges  to  x*.  Recall  from  (3.29)  that  4>^l\t)  = £•"'> /t  + 2 + 
G^l\  If  x*  ^ 0,  it  follows  that  E^/x^  and  / x^l+l>>  converge  to  the  same 

limit,  and  hence 


lim  0("')(x(n'))  = lim  ^"')(a:(n'+1))  . (3.43) 

l—>  oo  J J l—>  oo  J J 

Since  4>fl){xfl+1))  = 0 by  (3.36),  it  follows  from  (P3)  that 


_d_ 

dx 


$(cc) 


X=X‘ 


= 0 for  all  j such  that  x*  ^ 0 


(3.44) 


Using  Proposition  3,  we  can  prove  the  following  proposition,  which  will  be  used 
to  prove  that  the  sequence  jad'd}  converges. 

Proposition  4 The  sequence  jad'd}  has  a finite  number  of  limit  points. 

Proof  : Consider  the  following  sets 

Y = {1,  2, . . . , J} 

Z*  = {j  6 Y : x*  = 0} 

Z**  = {je  Y : x**  = 0}  , 


(3.45) 

(3.46) 

(3.47) 
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where  x*  and  x**  are  limit  points  of  {adn)}.  Now.  let  the  function  <f>*(cc)  be  the 
restriction  of  4>(a;)  to  the  reduced  parameter  set  W*  = {a;  > 0 : Xj  = 0 for  j 6 Z*}. 
Since  (f>*(a;)  is  strictly  convex  over  a convex  set,  the  unique  minimizer  of  <f>*(ai)  is  its 
only  stationary  point  [59,  Proposition  2.1.2,  p.  87].  Thus,  by  Proposition  3,  there 
is  only  one  limit  point  of  {adn)}  in  the  set  W*.  In  other  words,  if  Z*  = Z**,  then 
x*  = x**.  Therefore,  the  number  of  limit  points  are  bounded  above  by  the  number 
of  subsets  of  ¥,  which  is  clearly  finite.  ■ 

Theorem  1 The  sequence  {ad")}  converges  to  the  unique  minimizer  o/T. 

Proof : Let  x*  be  a limit  point  of  the  sequence  {ad")}.  Using  the  theorem 

in  [60.  p.  173],  which  says  that  the  set  of  limit  points  of  a bounded  sequence  {ad")} 
is  connected  if  {ad")  — ad"+1)}  — > 0,  we  obtain  the  fact  that  the  set  of  limit  points  of 
{ad")}  is  connected  by  Proposition  1 and  (P6).  Since  the  number  of  limit  points  of 
{adn)}  is  finite  by  Proposition  4,  {ad")}  has  only  one  limit  point.  Thus,  {ad")}  — > a;*. 
Now,  note  that  the  PML  objective  function  <4>  is  strictly  convex  on  the  set  {a:  : x > 0} 
(see  Appendix  C),  so  there  is  only  one  minimizer.  To  prove  the  sequence  {ad")} 
converges  to  the  unique  minimizer,  we  need  to  show  that  x*  satisfies  the  Kuhn- Tucker 
conditions  [55,  p.  777]:  for  all  j, 


x>-L*{x) 

* 

xt 

> 

0 

(3.48) 

x=x • 

= 

0 

(3.49) 

-k*(x) 

x=x * 

> 

0 . 

(3.50) 

Since  ad")  > 0 for  all  n,  it  must  be  true  that  the  limit  point  x*  is  nonnegative  (i.e., 
(3.48)  is  satisfied).  By  Proposition  5, 


_d_ 

dxj 


4>(a:) 


x=x 


= 0 for  all  j such  that  x*  ^ 0 . 


(3.51) 
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So,  (3.49)  is  satisfied.  Now,  we  consider  the  case  x*  = 0.  For  j such  that  x*  = 0, 


suppose 


d 

$(* 

dxj 


< 0 


x=x* 


(3.52) 


Then,  it  follows  that  lim  j>[n\xP)  < 0 by  (P3),  and  (f){Jl> (x(Jl>)  < 0 for  sufficiently 

n — J J J J 


large  n.  Consider  $n)(z$n+1))  = /xf+l)  + 2 Fjn)x'?+1)  + G™ . If  z}n+1J  < xf\ 


An ) /^.(n+1) 


(«)^.(n+1)  . r<(n) 


Jn+l)  (n) 


then 


(n+l)  ri(«) 

r — Httt  + 2F}n)^n+1)  + G'n)  > 0 
„(«)  (n+l)  J J J 


(3.53) 


because  ^"\x^n+1^)  = 0 by  (3.36)  and  /x^+V>  < 0.  Moreover,  the  fact  > 0 
implies  that 


E 


(n) 


-hr  + + G<”>  = $\zf ')  > 0 , 

x\ 


(3.54) 


which  is  a contradiction.  Thus,  x^n  1 1 ^ > xp]  for  all  sufficiently  large  n.  ffowever, 
this  contradicts  the  fact  x(p  — ► 0.  So,  it  is  true  that 


x=x * 


> 0 for  all  j such  that  x*  = 0 


(3.55) 


This  satisfies  (3.50).  ■ 

3.3  Properties  of  the  PML  Algorithm 

We  now  provide  a summary  of  the  desirable  properties  of  the  PML  algorithm: 

• The  PML  algorithm  is  straightforward  to  implement  because  there  are  no  hyper- 
parameters required  for  the  algorithm  itself  and  it  has  closed-form  expressions 
for  the  iterates.  Some  algorithms  require  hyperparameters,  such  as  relaxation 
parameters,  in  addition  to  the  penalty  parameter  [33,35],  while  others  [24,30] 
do  not  have  closed-form  expressions  for  the  updates. 

• The  PML  algorithm  theoretically  guarantees  nonnegative  iterates,  whereas  some 
algorithms  [33, 35]  set  any  negative  element  of  the  iterates  to  a small  positive 


number. 
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• The  PML  algorithm  monotonically  decreases  the  PML  objective  function  unlike 
the  algorithms  in  [23,35]. 

• The  PML  algorithm  can  incorporate  a large  class  of  edge-preserving  penalty 
functions  unlike  the  algorithm  by  De  Pierro  [30]. 

• The  PML  algorithm  converges  to  the  minimizer  of  the  PML  objective  function. 
Convergence  proofs  for  the  algorithms  in  [23, 33]  are  not  available. 


CHAPTER  4 

ACCELERATED  PENALIZED  MAXIMUM  LIKELIHOOD  ALGORITHM 

Although  the  PML  algorithm  presented  in  Chapter  3 converges  to  the  nonnega- 
tive minimizer  of  the  PML  objective  function,  it  has  the  drawback  that  it  converges 
slowly.  In  PET,  a popular  way  to  accelerate  iterative  image  reconstruction  algo- 
rithms is  through  the  use  of  so  called  ordered-subsets  [45].  In  ordered-subsets  based 
reconstruction  algorithms,  the  observed  data,  d.  is  divided  into  a predefined  num- 
ber of  subsets  via  some  chosen  rule.  Then,  the  iterative  reconstruction  algorithm 
to  be  accelerated  is  applied  sequentially  to  each  data  subset.  In  [45],  Hudson  and 
Larkin  developed  the  first  PET  image  reconstruction  algorithm  that  used  the  ordered- 
subsets  idea.  Since  the  MLEM  algorithm  was  applied  to  each  data  subset,  they  called 
their  algorithm  the  ordered-subsets  expectation  maximization  (OS-EM)  algorithm. 
In  [61],  Browne  and  De  Pierro  showed  that  the  OS-EM  algorithm  did  not  converge 
and  introduced  another  ordered-subsets  based  image  reconstruction  algorithm  that 
employed  a relaxation  parameter.  It  should  be  pointed  out  that  some  convergence 
results  are  available  for  ordered  subsets  based  algorithms  that  use  relaxation  param- 
eters [33,35,61], 

With  ordered-subsets  based  algorithms,  there  is  uncertainty  as  to  how  many 
subsets  to  be  used  and  how  the  data  should  be  divided.  Moreover,  it  is  not  clear  how 
relaxation  parameters  should  be  chosen  in  practice  because,  generally,  they  depend 
on  the  data. 

In  this  chapter,  we  introduce  an  accelerated  version  of  the  PML  algorithm,  re- 
ferred to  as  the  accelerated  PML  (APML)  algorithm,  that  uses  a pattern  search 
suggested  by  Hooke  and  Jeeve  [41,  pp.  287-291].  A pattern  search  has  also  been 
exploited  to  accelerate  an  algorithm  in  the  transmission  tomography  [62],  In  Section 
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Figure  4 1:  Two-dimensional  illustration  of  the  sequence  {x(n)}-  The  single  circles 
and  double  circles  denote  the  accelerated  iterates  {x(n^}  and  standard  iterates  {*(”)}, 
respectively.  Each  ellipse  represents  a set  of  points  that  have  same  cost.  The  mark  x 
denotes  the  minimizer  of  the  function  subject  to  the  constraints  xx  > 0 and  x2  > 0. 

4.1,  using  the  mathematical  ideas  in  the  convergence  proof  of  the  PML  algorithm, 
we  show  that  a sequence  that  satisfies  certain  conditions  converges  to  the  minimizer 
of  the  PML  objective  function.  Then,  we  use  this  result  to  prove  that  the  APML 
algorithm,  which  is  developed  in  Section  4.2,  converges  to  nonnegative  minimizer  of 
the  PML  objective  function.  In  Section  4.3,  we  introduce  the  direction  vector  to  be 
used  in  the  pattern  search.  Finally,  we  summarize  the  properties  of  the  APML  algo- 
rithm in  Section  4.4.  It  should  be  mentioned  that  we  introduced  the  APML  algorithm 
in  [19], 

4.1  Convergence  Proof 

In  this  section,  we  prove  that  the  sequence  cc(1\  a;(2),  x^2\  . . .} 

converges  to  the  minimizer  of  the  PML  objective  function  $,  where  x<'0'1  > 0 is  an 
initial  guess,  a;(n+1>  = argmin  4>( n\x ) subject  to  x > 0,  4>(n)  is  of  the  same  form 
of  the  surrogate  function  for  the  PML  objective  function  $ at  the  iterate  x(n\  4>(rl) 
in  (3.29),  except  that  4^  J is  defined  at  the  point  x ^ instead  of  x and  the  point 
x(n)  > 0 satisfies  the  following  conditions  for  all  n 
• (C4)  $(*<"))  < 4>(cc^) 
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• (C5)  there  exists  some  constant  c-2  > 0 such  that 
$(*<"))  - <L(£(n))  > c2  ||a;(n>  - x(n)||2. 

This  convergence  result  will  from  the  basis  for  the  convergence  proof  of  the  APML  al- 
gorithm. Note,  the  strict  positivity  of  x{n')  for  all  n is  necessary  because  the  surrogate 
function  for  the  PML  objective  function,  4>  , is  undefined  for  vectors  with  zero  or 

negative  elements.  An  example  of  such  a sequence  {x^}  is  illustrated  in  Figure  4 1. 
In  the  figure,  the  single  circles  and  double  circles  represent  the  accelerated  iterates 
and  standard  iterates  {a^n)},  respectively. 

First,  note  that  the  following  convergence  proof  is  mainly  based  on  the  conver- 
gence proof  of  the  PML  algorithm  in  Chapter  3.  Since  4>  is  bounded  below  and  the 
sequence  {x^}  monotonically  decreases  the  PML  objective  function  <F  by  (P4)  and 
(C4),  it  follows  that  (see  [57,  Theorem  1.4,  p.  6]) 

• (P7)  the  sequence  converges. 

By  (P4)  and  (C4),  it  is  straightforward  to  see  that  {x^}  C I = {1  > 0 : 4>(a;)  < 
4>(5(0))}.  Since  the  set  B is  bounded  (see  Proposition  i),  it  follows  that 

• (P8)  the  sequence  {x^}  is  bounded. 

Now,  note  that  cc(n+1)  is  the  minimizer  of  the  surrogate  function  4>l,  ),  which  satisfies 
(1)  4>{n)(*)  > $(*)  for  all  x > 0,  (2)  l>(n)(:K(n))  = <F(£(n>),  and  (3)  Vl»(n)(®(n))  = 
V4>(ce^).  Thus,  by  (P8)  (see  Proposition  2),  we  have  the  following  property: 

• (P9)  there  exists  some  constant  C3  > 0 such  that  d>(®^)  — d>(cc(n+1))  > 
c3  ||*(n)  -cc("+1)||2. 

By  (P7)  and  (P9),  we  obtain  the  property 

• (P10)  that  the  sequence  {x^  — adn+1)}  converges  to  0. 

Also,  by  (P7)  and  (C5),  it  must  be  true  that  the  sequence  {x^  — x^}  converges  to 
0.  Thus,  from  the  fact  that  ||x(7l)  - 5{n+1)||2  < ||*(n)  - x^n+^\\2  + ||x("+1)  - £(n+1)||2, 
the  property  follows: 

• (PH)  the  sequence  { x ^ — £^n+1^}  converges  to  0. 
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For  the  discussion  to  follow,  consider  the  surrogate  function  for  the  PML  objec- 
tive function  <f>  at  the  iterate  Pn\  <E»("\  (see  (3.29)) 


*W(*)  = Y,  >+<?!"’. 

3= 1 


(4.1) 


where  <f>j  \t)  = £']")  log(f)  -F  Fph2  + G^pt  for  t > 0,  Cp]  is  independent  of  x,  and 


'PijXj 


(n) 


tr  [vx^+P\t 

kzNj 

= A ~ ^ E - 4”))(4")  + 4"’ 

i= i fceWj 


(4.2) 

(4.3) 

(4.4) 


(note:  $l  \ Ep\  Pj"\  Gp\  an<^  resu^  by  substituting  x for  x in  (3.29),  (3.30), 
(3.31),  (3.32),  and  (3.33),  respectively).  To  prove  that  the  whole  sequence  {%(")} 
converges,  as  done  in  the  convergence  proof  of  the  PML  algorithm,  we  first  present 
the  following  proposition: 

Proposition  5 Let  x*  be  a limit  point  of  the  sequence  {x^}.  Then,  for  all  j such 
that  x*  ^ 0, 


_d_ 

dxj 


$(x) 


= 0 . 


x=x* 


(4.5) 


Proof : By  (P8),  the  sequence  {x(,l)}  is  bounded  and  there  is  a subsequence 
{x^}  that  converges  to  x*.  By  (P10),  the  subsequence  {x^71'"1-1^}  also  converges  to 
x*.  Recall  that  <pj  ( t ) = Ep1^ ft  + 2 Fp'h  + G(pK  Now,  if  x*  ^ 0,  it  follows  that 
Ep'1  /xY  and  Epl)/xp+l)  converge  to  the  same  limit,  which  in  turn  implies  that 

lW-  “>•  (4.6) 


lim  _ 

l—>  oo 


4>j  (xp’)  = P (x{p+1)) 

J J 1-*  oo  J J 


i(ni) 


dxj 


<f>(x) 


x=x 


= 0 for  all  j such  that  x*  ^ 0 . 


Since  4>]  (x<n,+1))  = 0 by  (3.36),  and  V$'”';(x(ni))  = V4>(x(”i}),  it  can  be  said  that 

d 


(4.7) 
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Theorem  2 The  sequence  {x^}  converges  to  the  unique  minimizer  o/4>. 

Proof  : Let  x*  be  a limit  point  of  the  sequence  {ct^n)}.  Since  the  set  of  limit 
points  of  the  sequence  {x(n)}  is  connected  by  (P8)  and  (Pll)  (see  [60,  p.  173]  and 
Theorem  1 in  Chapter  3)  and  there  are  a finite  number  of  limit  points  of  {ai^}  by 
Proposition  4 and  Proposition  5,  it  follows  that  there  is  only  one  limit  point  in  the 
set  of  limit  points.  Thus,  { x — » x*.  Since  lim{||i^  — cc*||2}  = 0,  by  (P10)  we 
have  lim{||a:(n+1)  — £c*||2}  = 0 (note:  ||z:(n+1)  — a;*||2  < || aJ(rl+1)  _ x^n^||2  + ||ai^  — 
x*||2).  Hence,  {x^"+1^}  — > x*  (i.e. , { x ("^}  — » a;*).  Therefore  we  can  deduce  that 
the  whole  sequence  {x^}  x* ■ To  prove  the  sequence  {x(T}  converges  to  the 
unique  minimizer  of  the  PML  objective  function,  we  must  show  that  x*  satisfies  the 
Kuhn- Tucker  conditions  (i.e.,  (3.48),  (3.49),  and  (3.50)).  Since  all  the  points  in  the 
sequence  {x^}  are  positive,  it  must  be  true  that  the  limit  point  x*  is  nonnegative 
(i.e.,  (3.48)  is  satisfied).  By  Proposition  5, 


l;Hx] 


X=X‘ 


= 0 for  all  j such  that  x*  ^ 0 . 


(4.8) 


Thus,  (3.49)  is  satisfied.  For  j such  that  x*  = 0,  suppose 

< 0 . 


-^—$(x) 


dx 


Then,  it  follows  that  lim  0-  ( x ^)  < 0 by  the  property  "'(x[n))  — V<3>(alw), 
and  (f)j  (x^)  < 0 for  sufficiently  large  n.  Consider  ^ (xj"+1^)  = / XjH+1^  + 

2F\n)xf+l)  + Gf\  If  x'n+1)  < xf\  then 

(n+l)  r(n) 

2 + 2 F(ny."+1)  + G{"]  > 0 


x=x * 


(4.9) 


(4.10) 


~ (74) 

because  ^ (x^n+1))  = 0 by  (3.36)  and  Ejn'>/xjl+1'>  < 0.  Moreover,  the  fact  that  Fj  is 
positive  implies  that 


?(n) 


X 


(n) 


i or’(n)~(n)  i /-lira)  .(n)/~(n)\  ^ n 
+ 2 F-  ’Xj  + Gj  ’ = Cfj  )(Xj)>  0 , 


(4.11) 


which  is  a contradiction.  Thus,  > x ^ for  all  sufficiently  large  n.  However, 


this  contradicts  the  fact  — » 0.  Therefore, 


x=x * 


> 0 for  all  j such  that  x*  = 0 . 


(4.12) 


This  satisfies  (3.50).  ■ 

4.2  Accelerated  Penalized  Maximum  Likelihood  (APML)  Algorithm 

In  Section  4.1,  we  showed  that  the  sequence  {x^}  = {a^°\  x^2\  x^2\  . . .} 

converges  to  the  nonnegative  minimizer  of  the  PML  objective  function  $ if:  (1) 
£c(0)  > 0,  (2)  a4n+0  is  the  nonnegative  minimizer  of  the  surrogate  function  for  the 
PML  objective  function  <f>  at  the  iterate  cc(n),  I*'"*,  and  (3)  > 0 satisfies  (C4) 

and  (C5).  fn  this  section,  we  present  an  algorithm  that  produces  such  a sequence 

{x(n)}- 

First,  consider  the  following  steps:  given  an  initial  guess  x (0^  > 0,  for  n = 
0,1,2,... 

• Step  1 Get  the  standard  PML  iterate  cc^"+1^  = argmin  ^^(a:)  subject  to 
x > 0. 

• Step  2 Get  the  accelerated  PML  iterate  = x^n+1^  + ain+1',i/n+1), 

where  n^n+1^  ^ 0 is  the  chosen  search  direction  (i.e.,  direction  vector)  and 

q.G+1)  = argmin  <f>(a^"+1)  + at/n+1))  . (4.13) 

a 


• Step  3 Repeat  the  steps  above  until  some  chosen  stopping  criterion  is  met. 
With  t/n+1)  = a4"+1)  — x^n\  Step  2 is  the  pattern  search  step  put  forth  by  Hooke 
and  Jeeve  [41,  pp.  287-291]. 
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We  now  modify  Step  2 so  that  the  sequence  produced  by  Steps  1.  2.  and  3 
converges  to  the  nonnegative  minimizer  of  the  PML  objective  function.  Step  2 does 
not  guarantee  that  the  accelerated  iterate  is  positive.  Consequently,  we  modify  the 
optimal  step  size  as  follows 

a !n+1)  = argmin  <f>(cc(n+1)  + ar(n+1))  subject  to  a G A(ra+1)  , (4.14) 

a 

where 

A(n+1)  = {a  : zjn+1)  + av\n+1)  > C(”+1)  for  all  j}  (4.15) 

a r("+1) 

C(n+1)  = min  — — — (4.16) 

j n + 2 v ' 

Observe,  lor  simplicity,  we  use  the  same  notation  for  the  optimal  step  sizes  defined  by 
(4.13)  and  (4.14).  For  a G A(,l+1^.  it  is  straightforward  to  see  that  a ;(n+1)+at;(n+1)  > o 
because  > 0 for  all  j and  n.  Thus,  by  adding  the  constraint  a € A^"+1^  to  the 

problem  in  (4.13),  it  follows  that  x(ri)  > 0 for  all  n.  Because  xjn+1)  > £(n+i)  for  all  j 
and  n,  it  is  evident  that  (4n+1)  has  been  chosen  in  such  a way  that  0 G A(n+1)  for  all 
n.  The  fact  that  0 G A^"+1^  will  be  used  later  to  prove  that  the  proposed  algorithm 
monotonically  decreases  the  PML  objective  function  4>  with  increasing  iterations. 

Remark  : If  we  allow  some  elements  of  x ^ to  be  zero,  then  the  feasible  region 
of  the  function  $(x(n+1)  + cru(n+1))  is  {a  : a^"+1)  + > 0 for  all  j}.  However, 

for  the  sequence  {x^}  to  converge,  we  must  constrain  all  elements  of  x ^ to  be 
positive  because  the  surrogate  function  for  the  PML  objective  function  4>,  $ , is 

not  defined  at  x = x ^ where  x ^ = 0 for  some  j.  A feasible  region  of  the  function 
$(a;(n+b  + a"ihn+1))  that  appears  natural  is 

A^n+p  a . x(rt+1)  _p  cnv^+V>  > 0 for  all  j}  . 


(4.17) 
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Figure  4 2:  Illustration  of  the  pattern  search  step:  (a)  two-dimensional  illustration 
of  (1>,  (b)  one-dimensional  slice  of  the  function  along  the  chosen  direction  vector.  The 
single  circle  and  double  circle  denote  an  accelerated  iterate  x ^ and  a PML  iterate 
cc("+1),  respectively.  The  mark  x denotes  the  minimizer  of  the  function  subject  to  the 
constraints  x\  > 0 and  %2  > 0. 

However,  the  set  A[" ' 1 1 is  an  open  set.  Consequently,  the  optimization  problem 

argmin  <F(a:(n+1)  + subject  to  a G Ain+1^  (4-18) 

Q 

may  not  have  a solution.  To  see  this  fact,  suppose  that  the  function  <F(cc^"+1^  4- 
a«(n+1))  is  strictly  convex  and  Ao”+1)  = {a  : Lq"+1)  < a < U^1^}  for  some 
and  t/d”+1\  If  the  function  <5(a;ffi+1)  + at)fn+1))  has  as  its  minimizer  a = Lq"+1^  over 
the  set  {a  : L^+1)  < a < then  the  optimization  problem  in  (4.18)  does  not 

have  a solution. 

One  more  point  to  mention  about  the  set  A(n+1)  is  that,  strictly  speaking,  (d"+i) 
can  be  any  positive  number  as  long  as  it  is  chosen  in  such  a way  that  0 G A(n+1)  for 
all  n.  However,  in  addition  to  the  condition  0 G A^n+1),  it  is  reasonable  to  keep  the 
set  A(n+1)  as  close  as  possible  to  Aq"+1)  in  (4.17).  In  this  sense,  the  chosen  £(n+1)  in 
(4.16)  is  optimal  when  n is  sufficiently  large.  ■ 

Another  issue  with  Step  2 is  that  it  is  hard  to  solve  the  optimization  problem 
in  (4.13).  A sub-optimal  solution  can  be  found  by  determining  a surrogate  function 
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for  4>„’fl^(a)  = 4>(aAJ+b  4-Q-w*n+1^),  which  we  denote  by  r(n+1)(a),  that  satisfies  the 
following  conditions: 

• (C6)  r("+1)(a)  > $Ln+1)(ct)  for  a G A(”+1> 

• (C7)  r<"+1)(a)  = 4n+1)(a)  for  a = 0. 

By  incorporating  the  surrogate  function  r(n+1)  with  the  constraint  a G A(n+1),  an 
alternative  to  Step  2 is: 

• Step  2a  Get  the  accelerated  PML  iterate  = ;*4n+1)  + rdn+1M"+1),  where 

Q.in+i)  ^ argmin  r^"+1^(a)  subject  to  a G A^n+1^  . (4-19) 

a 

It  is  important  to  point  out  that,  for  convenience,  x(n+1)  has  been  re-defined  in 
Step  2a.  This  new  definition  will  be  used  throughout  the  rest  of  the  dissertation. 
In  Figure  4 2,  the  alternative  pattern  search  step  with  the  surrogate  function  r^n+1^ 
is  illustrated.  In  Figure  4 2 (a),  a two-dimensional  example  of  <4>  is  shown  with  a 
direction  vector  t/n+1).  In  addition,  the  one-dimensional  slice  of  4>  along  the  direction 
vector  which  we  denote  4>q1+1\  and  a surrogate  function  T^n+1^  that  satisfies 

(C6)  and  (C7)  are  shown  in  Figure  4 2 (b). 

By  design,  the  sequence  {%(n)}  produced  by  Steps  1,  2a,  and  3 satisfies  the 
monotonicity  condition  (C4).  To  see  this  fact,  note  that  4>(cc(n+1))  = 4>in+1^(a("+1)) 
by  the  definition  of  By  (C6),  it  follows  that  $L"+1)(a(ri+1))  < r^"+1)(o:("+1^). 

Also,  from  the  definition  of  a^"+1)  in  (4.19)  and  the  fact  that  0 G A("+1),  we  obtain 
the  result  r("+1)((Ara+1))  < r^"+1^(0).  Finally,  by  (C7),  it  can  be  concluded  that 
$(5(n+1))  < r(”+1)(0)  = $in+1)(0)  = <F(a4n+1>)  for  all  n. 

We  now  present  our  choice  for  the  surrogate  function  r^n+1^  that  satisfies  (C6) 
and  (C7).  First,  note  that  the  negative  log  likelihood  function  — 4/  can  be  expressed 
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as 

/ i 

~^{x)  = ^2{[Px\t  - di  log  ([Pa;  + p],)}  + + log(rfj)}  (4.20) 

i= 1 i=l 

1 

= + C*5  , (4.21) 

i= 1 

where  = t — di\og(t  + p,;)  and  C5  = {p>  + l°g(rf» !) } • Suppose  a function, 

can  be  found  such  that 

• (C8)  08(n+1)(a)  > 4n+1)(a)  for  a £ A{n+1) 

• (C9)  dln+1\a)  = ipln+1\a)  for  a = 0, 

where  ^+l\a)  = ^([Px^+%  + a[Vv^n+%).  Then,  the  function 

7 

0(n+D(a)  ^ J2  e[n+1\a)  + C5  (4.22) 

i=l 

will  satisfy  the  conditions 

• (CIO)  0(n+1)(a)  > -T(a;(n+1)  + av(n+1))  for  a £ Al"+1) 

• (Cll)  ©l"+1l(a)  = -^(a;("+1)  +an(n+1l)  for  a = 0. 

A function  that  satisfies  (C8)  and  (C9)  is 

et+l\a)  = /4"+1)y  + ^(”+1)(0 )a  + ^("+1)(0)  , (4.23) 

where  pjn+1)  = ma x{fyn+1\a)  subject  to  a £ A(n+1)}.  From  the  definition  of  0jn+1), 
it  is  obvious  that  (9jn+1)(0)  = ^"+1)(0).  Thus,  (C9)  is  satisfied.  To  see  that  6^ra+1) 
satisfies  (C8),  consider  the  function  2|"+1'(q)  = 6^n+1\a)  — '0jnt^(a).  From  the 
definition  of  p-n+1\  it  is  clear  that  > 0.  Thus,  it  follows  that  is  a convex 

function.  Moreover,  a = 0 is  a minimizer  of  z-n+1)  because  i-n+1)(0)  = 0 by  the 
definition  of  Since  z-,l+1)(0)  = 0 by  (C9),  it  is  straightforward  to  see  that 

z-n+1\a)  > 0.  This  result  implies  that  #j"+1>(a)  > ^n+1\a)  for  all  a € A("+1).  So, 
(C8)  is  satisfied.  To  calculate  it  is  worthwhile  to  note  that  the  set  Aln+1l  can 
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be  written  as  Ab,+1)  = {a  : L^+1)  < a < fAn+1)},  where 

’ £(n+i)  _ x(n+b 


L(n+1)  = max 


(»i+i) 


j such  that  ujn+1^  > 0 


A f A(n+1)  _ ™(n+l) 

U {n+1)  = min  i : j 


j such  that  u|"+1)  < 0 
J j 


(n+l) 


(4.24) 

(4.25) 


Observe  that  L(n+1)  < 0 and  £/(,l+1)  > 0.  Since  the  second  derivative  of  ^"+1) 


is 


di([Vv^+%Y 


([Pv(n+%a  + [Px(n+%  + Piy  ’ 
the  maximum  second  derivative  of  ^-"+b  for  a £ A^n+1^  is 


(4.26) 


^ +1)  = < 


M(^+1>],)2 f7?„(«+i)l . > o 

([pv(n+1)]jL("+i)+[ra(n+1)]j+ft)2i  v v J*  ^ u 

Ml ?y(n+1)].)2 fT>„(«+l)l.  < 0 

([pt;(n+l)].f/(n  + l)  + [-pa;(n  + 1)].+/3,)2l  P U U ^ U 

0,  [Pv(-n+%  = 0 . 


(4.27) 


It  should  be  noted  that  \Pv(n+l\a  + [Px^n+V>]i  + pi  > 0 for  a G A(n+b  (see  (ASl)). 

At  this  point,  we  need  a surrogate  function  for  the  penalty  function  A once  again. 
As  mentioned  in  Chapter  3,  under  assumptions  (AS3)-(AS8),  Huber  developed  a 
surrogate  function  for  A,  denoted  by  A(n)  (see  (3.14)).  By  the  properties  of  the 

surrogate  function  \(n\  it  is  clear  that  the  surrogate  function  A^b,  which  is  defined 

by 

j 

A(n)(aO  = E E ^(n)(M  - xk)  , (4.28) 

j= 1 keNj 

satisfies  A(n)(»)  > A(x)  for  all  x and  A(")(aA"))  = A(x(ri)).  Thus,  A(n+1)(a;(n+1)  + 
au(n+1))  will  satisfy 

• (C12)  A(n+1)(x(«+i)  + av(n+1))  > A(aAn+1)  + au(n+1))  for  a € A<n+1) 

• (C13)  A(n+1)(x(n+1>  + ai/n+:b)  — A(xln+b  4-  a'i/"+b)  for  a = 0. 

Finally,  by  (C10)-(C13),  it  is  clear  that  the  function 


F(«+i)(a)  £ 0("+1)(a)  + (3A(n+1\xin+V  + av («+!)) 


(4.29) 
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satisfies  (C6)  and  (C7). 

To  solve  (4.19),  we  first  determine  the  unconstrained  minimizer  of  rG+b: 


d(n+1)  = argmin  r(n+1)(a) 


(4.30) 


Since  T(n+1)  is  strictly  convex  by  j(t)  > 0 for  — oo  <t<  oo,  t>G+b  ^ 0,  (ASl),  and 
(AS2)  (see  Appendix  D),  the  expression  for  dG+b  can  be  found  by  simply  computing 
the  first  derivative  of  pG+b  and  setting  it  to  zero  (see  Appendix  D): 


^”+1,(0)  - P zu  ^V4"+1’  - 4”+,’)(*> 


(n+1) 


„(n+1h/’„,G+b 


ct(n+1) 


(n+l) 


Ei=i  ^ + P EU  * } - 4 )2 


(4.31) 

Given  (4.24),  (4.25),  and  (4.31),  the  solution  to  the  constrained  optimization  problem 
in  (4.19)  is 


a 


(n+l) 


u{n+1\ 

if  a(n+ b > t/(n+b 

r (n+l) 

) 

if  dG+b  < pG+b  . 

(4.32) 

aG+b 

otherwise 

All  that  remains  now  is  to  show  that  the  sequence  {x^}  produced  by  Step  1, 
Step  2a  with  rG+b  in  (4.29),  and  Step  3 satisfies  (C5).  To  see  this,  note  that  by 
(C6),  (C7),  and  the  definition  of  a^n+1\  we  have  the  following  inequality 


$(a:(n+1))  - $(®G+b)  > T(,l+1)(0)  - r(n+1)(a(n+1))  . 


(4.33) 


Suppose,  for  each  n,  the  function  rG+b  is  expanded  into  a second-order  Taylor  series 
about  the  point  cnG+b  and  evaluated  at  a = 0.  Then,  the  right  hand  side  of  (4.33) 
can  be  written  as 

rG+b(0)  - r(n+1)(a(n+l))  = rG+1)(a(n+1))(_aG+D)  _|_  IfG+b^G+b^-QjG+b^  ^ 

(4.34) 
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where  a(n+1)  is  a point  between  0 and  a(n+1).  By  the  strict  convexity  of  r(n+1\ 
£('»+!)  < q_  ancj  [/("+i)  > o,  it  follows  that  r(n+1)(a(ri+1)^_a(«+i))  > q for  all  n. 

Thus, 

T(n+1)(0)  - r(n+1)(a(n+1))  > If("+1)(a(n+1>)(a(n+1))2  . (4.35) 

Since  there  exists  a symmetric  positive  definite  matrix  A4,  which  is  independent  of 
n,  such  that  f («+1)(a("+1))  > 2{v^)' M{v^n+^)  (see  Appendix  E),  it  follows  that 

r(n+1>(0)  - T(n+1)(a(n+1))  > (r;("+1))/M(v(n+1))(a(ri+1))2  . (4.36) 


Hence,  to  prove  (C5),  we  show  that  there  exists  some  constant  C2  > 0 such  that 


(v("+1)yx(t,("+1))  > C2||v(n+1)||2 , 


(4.37) 


where  we  used  the  fact  that  ||x(n+1)  — ®(n+1)||2  = (a(n+1))2||v(n+1)||2  by  the  definition 
of  Since  M is  a symmetric  matrix,  it  can  be  factored  as  M = TAT'  by  the 

Spectral  theorem  [63,  p.309],  where  the  columns  of  the  matrix  T contain  orthonormal 
eigenvectors  of  A4  and  the  diagonal  matrix  A contains  corresponding  eigenvalues 
along  its  diagonal.  Using  the  fact  that  T'T  produces  the  J x J identity  matrix  and 
Rayleigh’s  quotient  with  i/n+1)  = Y z(n+1)  (i.e.,  z(n+1)  = YV^1))  [63,  p.348],  it 
follows  that 


(-y(n+1))'_A4(v("+1)) 

||^(n+l)  112 


(Yz(n+1))'A4(Yz^"+1)) 

(Yz("+1))'(Yzb+1)) 

(z(n+l))/^(z(n+l)) 

~ (z(n+1))/(z(n+1)) 

e,(4"+1>)2 

_ 7 


(4.38) 

(4.39) 

(4.40) 

(4.41) 


where  {ej}j=1  are  the  eigenvalues  of  A4  and  em  is  the  smallest  eigenvalue.  Since 
M is  positive  definite,  em  is  positive.  Therefore,  with  c2  = em  > 0,  we  obtain 
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(i>(n+i)y.M('u(n+i))  > C2||u^n+1^||2  and 

4>(z:(n+1))  - $(x(”+1))  > c2  ||x(”+1)  - x(n+1)||2.  (4.42) 

4.3  Direction  Vectors 

In  some  algorithms,  the  gradient  of  a function  is  used  as  a direction  vector,  as 
in  the  method  of  steepest  descent  [55,  ch.  14].  Suppose  the  direction  vector  r/n+1) 
was  chosen  to  be  the  gradient  of  the  PML  objective  function  <f>  evaluated  at  the  PML 
iterate  (i.e. , rhn+P  = V$(®*n+1^)).  Then,  the  gradient  must  be  calculated  at 

each  step.  However,  the  computational  cost  of  the  gradient  of  4>  is  on  the  same  order 
as  the  computational  cost  of  single  PML  iteration.  Moreover,  in  experiments,  the 
APML  algorithm  with  r>(n+!)  = V<I>(aAl+1))  decreased  the  PML  objective  function 
slower  than  the  PML  algorithm. 

The  current  direction  vector  in  the  pattern  search  step  by  Hooke  and  Jeeve  is 
the  difference  of  the  two  most  recent  iterations.  Specifically,  the  direction  vector  is 
defined  by  iAn+P  = cc("+1)  — x^n\  This  choice  can  be  justified  in  a reasonable  manner. 
To  see  why,  assume  that  a closed-form  expression  is  available  for  the  minimizer  of 
^)(a;(n+1)  _|_  Q;t;(n+1)).  Then,  the  “best”  direction  vector  is  (®("+i)  — x*)  because 
<I>(ahn+b  + a?;(n+1))  “contains”  the  point  x*  (note:  x*  would  result  with  a = — 1), 
where  x*  is  the  minimizer  of  the  PML  objective  function.  However,  x*  is  not  known 
at  the  nth  iteration.  The  “best”  estimate  of  x*  at  the  nth  iteration  is  x^n\  namely 
the  accelerated  iterate  at  the  (n  — l)t/l  iteration.  In  this  section,  we  introduce  a 
direction  vector  that  works  better  than  Hook  and  Jeeve’s  direction  vector  in  terms 
of  convergence  rate. 

The  direction  vector  we  choose  is  a simple  variation  of  p(ra+1)  that  is  not  com- 
putationally expensive  (note:  J subtractions  are  required  for  £>(n+1)).  Since  there 
are  no  positrons  emitted  outside  the  subject  being  scanned,  the  PML  estimate  will 
contain  many  values  near  zero.  This  claim  is  supported  by  Figure  4 3.  In  the  figure, 
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Figure  4 3:  PML  iterates  are  shown:  (a)  the  13th  PML  iterate,  (b)  the  l^th  PML 
iterate,  (c)  the  15th  PML  iterate,  and  (d)  the  1000^'  PML  iterate.  The  images  were 
generated  using  a real  thorax  phantom  data.  The  plane  considered  contains  activity 
due  to  the  heart,  lungs,  spine,  and  background.  For  display  purposes,  all  the  images 
were  adjusted  so  that  they  have  the  same  dynamic  range. 
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PML  images  corresponding  to  different  iteration  numbers  are  shown.  The  images 
were  generated  by  applying  the  PML  algorithm  to  real  thorax  phantom  data  (scan 
duration  was  14  minutes)  with  an  uniform  initial  estimate.  The  penalty  parameter 
was  f3  = 0.02  and  A (t)  = log(cosh(|))  with  S = 50  was  the  penalty  function.  As 
can  be  seen  in  Figure  4 3 (a),  (b),  and  (c),  the  early  iterates  contain  values  near 
zero  outside  of  the  body.  Consider  the  figure  in  Figure  4 3 (d),  which  is  the  1000th 
PML  iterate  (practically  speaking,  the  iterate  is  the  minimizer  of  <£>  because  it  did 
not  decrease  after  the  791st  iteration  up  to  the  5000th  iteration).  The  1000th  iterate 
also  contains  many  values  near  zero  outside  the  body.  Inside  the  body,  on  the  other 
hand,  the  1000th  iterate  is  very  different  from  the  early  iterates.  From  the  above 
observations,  it  can  be  said  that  convergence  rate  varies  significantly  between  voxels 
inside  the  body  and  voxels  outside  the  body. 

From  the  above  observations  that  the  voxels  outside  the  body  converge  faster 
than  the  voxels  inside  the  body  and  the  voxels  outside  the  body  converge  to  values 
near  zero  that  is  the  boundary  of  the  set  {cc  : x > 0},  we  claim  that  it  is  better  to 
search  for  an  accelerated  iterate  along  the  boundary  whenever  the  current  iterate  is 
“near”  the  boundary.  By  “boundary”,  we  mean  the  set  {*>0:^  = 0 for  some  j}. 
This  can  be  explained  by  the  example  shown  in  Figure  4 4.  In  Figure  4 4 (a),  the 
second  PML  iterate  is  heading  toward  the  x2-axis.  If  we  perform  the  acceleration  step 
with  the  direction  vector  jAn+1),  then  the  accelerated  iterate  will  lie  on  the  a^-axis  as 
shown  in  Figure  4 4 (b).  A better  direction  to  be  searched  is  the  one  that  is  parallel 
to  the  X2-axis  as  shown  in  Figure  4 4 (c).  The  principle  of  the  proposed  direction 
vector  is  to  exclude  the  coordinates  corresponding  to  the  voxel  values  that  are  “near” 
the  boundary.  An  easy  way  to  incorporate  this  idea  is  to  determine  the  voxels  whose 
values  are  less  than  a small  positive  value,  e.  Specifically,  the  proposed  direction, 
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(a)  (b)  (c) 


Figure  4 4:  Direction  vectors:  (a)  Standard  PML  iterates  are  shown,  (b)  an  acceler- 
ated iterate  (the  single  solid  circle)  is  shown  by  using  the  direction  vector  C/(n+1\  and 
(c)  an  accelerated  iterate  (the  single  dotted  circle)  is  shown  Dy  using  the  proposed 
direction  vector  i++1\  The  double  circles  denote  PML  iterates.  The  mark  x denotes 
the  minimizer  of  the  function  subject  to  the  constraints  x\  > 0 and  x2  > 0. 
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1. 1 Properties  of  the  APML  Algorithm 

We  now  provide  a summary  of  the  desirable  properties  of  the  APML  algorithm: 

• The  APML  algorithm  needs  only  one  additional  parameter  (i.e.,  e),  whereas 
ordered-subsets  based  algorithms  [33, 35]  need  at  least  three  extra  parameters 
(i.e.,  a relaxation  parameter,  the  number  of  subsets,  and  a small  positive  number 
that  is  set  to  any  nonpositive  elements  of  an  iterate). 

• In  experiments,  the  proposed  direction  vector  performed  better  than  the  direc- 
tion vector  put  forth  by  Hooke  and  Jeeve  [41,  pp.  287-291], 

• The  APML  algorithm  monotonically  decreases  the  PML  objective  function  un- 
like the  algorithms  in  [23,35]. 
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• The  APML  algorithm  theoretically  guarantees  nonnegative  iterates,  whereas 
some  algorithms  [33,  35]  set  any  negative  element  of  the  iterates  to  a small 
positive  number. 

• The  APML  algorithm  can  incorporate  a large  class  of  edge-preserving  penalty 
functions  unlike  the  algorithm  by  De  Pierro  [30]. 

• The  APML  algorithm  converges  to  the  minimizer  of  the  PML  objective  function. 
Convergence  proofs  for  the  algorithms  in  [23, 33]  are  not  available. 


CHAPTER  5 

QUADRATIC  EDGE  PRESERVING  ALGORITHM 


In  this  chapter,  we  present  a regularized  image  reconstruction  algorithm  that 
aims  to  preserve  edges  in  the  reconstructed  images  so  that  fine  details  are  more  re- 
solvable. We  refer  to  the  proposed  algorithm  as  the  quadratic  edge  preserving  (QEP) 
algorithm.  The  QEP  algorithm  results  via  a certain  modification  of  the  surrogate 
function  $0)  for  tRe  PML  objective  function.  It  should  be  mentioned  that  the  QEP 
algorithm  was  first  introduced  in  [18]. 

Recall  that  the  nth  surrogate  function  for  the  PML  objective  function  $, 
can  be  expressed  as  $(")(x)  = <t>f\xj)  + C^n)  (see  (3.29)  and  (3.35)).  For  the 

discussion  to  follow,  it  will  be  convenient  to  express  <p^n>  in  (3.35)  in  a different  manner 


<t>\ '?{t ) = Efhog^  + F^e  + G^t 
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and  rrijl\  EjH\  F^"] , and  G'*")  are  defined  in  (3.24),  (3.30),  (3.31),  and  (3.32),  respec- 
tively. Note  that  the  jth  element  of  the  next  PML  iterate  affn+1),  a^n+1\  is  defined  to 
be  the  nonnegative  minimizer  of  the  function  0)n).  The  function  hJfi}  is  quadratic  so 
its  aperture,  ujjk 7(x^  — x[n)),  and  minimizer,  are  expected  to  play  key  roles  in 
the  regularization  process  (for  a quadratic  function,  f(t ) = at2  + bt  + c , the  constant 
a is  called  the  aperture  of  the  function).  To  see  the  role  of  the  function  h^)  on  deter- 
mining Xj” suppose  (3  = 0 in  (5.5).  Then,  = l^\t)  and  the  minimizer  of  the 

function  4>\n\ t ) is  the  “pure”  log  likelihood  iterate  (i.e..  minimizer  of  l^l> ) . For  (3  7^  0, 
intuitively  speaking,  it  is  evident  that  the  functions  {h^}keN:l  act  to  bias  x'-1 1 1 1 from 
the  pure  log  likelihood  iterate  towards  the  minimizers  of  {h^jkeNj  (observe:  the 
last  term  in  (5.5)  is  independent  of  x).  The  degree  of  influence  that  the  functions 
{h'jk  [rev,  possess  is  controlled  by  their  apertures  and  the  penalty  parameter  (3 . 

To  highlight  the  role  of  the  function  7,  we  let  (3  = | and  ui3k  = 1 for  all  j and  k.  In 
this  case,  the  aperture  of  equals  7 (x^  — x["^).  For  the  quadratic  penalty  function 
(i.e.,  A (t)  = t 2),  the  aperture  of  is  a constant  for  all  j,  k , and  n.  Consequently, 
for  all  k e Nj,  the  function  has  the  same  degree  of  influence  regardless  of  the 
absolute  difference  between  xjn)  and  x[n).  Practically  speaking,  this  means  that  the 
quadratic  penalty  will  overly  smooth  edges.  To  preserve  edges,  it  would  be  helpful 
to  lessen  the  degree  of  influence  of  the  functions  {h'j^}keNj  whenever  the  absolute 
difference  between  and  xj^n)  is  sufficiently  large.  Figure  5 1 is  a plot  of  7 when 
A(t)  = log(cosh(|))  (recall:  7(f)  = A (t)/t  in  (AS6))  is  used  in  the  penalty  function 
with  5 = 10,  50, 100, 500.  From  the  figure,  it  can  be  seen  that  7 becomes  very  small 
compared  with  7(0)  for  |f|  >>  6.  This  means  that,  when  |x^n)  - xj;n)|  >>  5,  h%]  will 
have  a “very  small”  aperture,  and  consequently  the  function  will  not  have  much 
influence  on  x^‘ 1 1 1 . Said  another  way,  the  log-cosh  penalty  function  helps  preserve 
edges  whose  “heights”  are  on  the  order  of  6. 
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Figure  5 1:  The  function  7 is  shown  when  A (t)  = log(cosh(|))  is  used  as  the  penalty 
function  with:  (a)  <5—10,  (b)  5=50,  (c)  5=100,  and  (d)  5=500. 


We  will  now  move  the  discussion  from  the  aperture  of  the  function  h^}  to  the 
minimizer  of  hff.  As  stated  previously,  the  minimizers  of  the  functions  {h^}keNj 
play  an  important  role  in  the  regularization  procedure.  More  specifically,  when  the 
aperture  of  is  sufficiently  large,  the  (n+l)si  iterate  is  biased  towards  the  minimizer 
of  h^\  which  is  mJ]k  = {x{-l)  + xk'])/2.  As  a result,  there  is  inherent  averaging  that 
takes  place  with  the  PML  algorithm. 

Consider  a penalty  function,  where,  for  certain  functions  {<r^},  it  is  of  the  form 

A$?(*)  =2EEAAs)-  <5'8> 

j= 1 keNj 

In  order  to  better  preserve  edges,  we  believe  an  improvement  would  be  to  construct 
cTj7^  so  that  it  has  the  same  aperture  as  , but  a different  minimizer.  The  minimizer 
of  (Tjl\  which  would  depend  on  whether  an  edge  is  present,  is  chosen  to  be 


ujk  = x5n)  - - 4n)) . 


(5.9) 
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where  rj  is  a function  such  that 


v(t) 


e,  e <t 
-f,  *<-£ 

t,  otherwise 


(5.10) 


and  £ is  a parameter  that  represents  the  “heights”  of  the  edges  that  are  to  be  pre- 
served. An  example  of  a function  that  can  be  used  is  r](t)  = £tanh(|).  In  order  for 
Uj£  to  be  the  minimizer  of  we  define  a to  be 


Vjki1)  = Ujkl{x("]  - x[n))(t  - v$)2  . (5.11) 

Using  when  the  absolute  difference  between  x -l)  and  x[n)  is  less  than 
(i.e.,  no  edge  present),  smoothing  takes  place  because  « x^\  On  the  other 
hand,  when  the  absolute  difference  between  x^l)  and  x^  is  greater  than  £ (i.e.,  edge 
present),  less  smoothing  takes  place  because  there  are  upper  and  lower  bounds  for 
utk-  To  help  make  our  idea  clearer,  consider  Figure  5 2.  In  the  figure,  and  uff 
are  plotted  as  a function  of  x^l\  where  xtJ','>  — 800  (value  is  chosen  arbitrarily)  and 
r](t)  = 250tanh(2|Q).  It  can  be  observed  that  is  approximately  x ^ + 250  when 
xk  1 is  larger  than  1 +250  and  is  approximately  x^  — 250  when  is  less  than 
xj  ^ ~ 250.  In  other  words,  the  function  77  prevents  from  being  overly  biased 

towards  x^  when  the  absolute  difference  between  and  x^  is  greater  than  250. 

Using  the  edge  preserving  penalty  function  Acp*.  an  algorithm  for  obtaining  reg- 
ularized estimates  of  the  emission  means  follows: 

• Step  1 Let  > 0 be  the  initial  estimate 

• Step  2 Construct  A^(a:)  from  the  current  estimate  using  (5.8),  (5.9), 
(5.10),  and  (5.11) 

• Step  3 Compute  a:(n+1)  = argmin  E(n)(aj)  subject  to  x > 0.  where  E^(x)  = 
-V(x)+pA&\x). 

• Step  4 Iterate  between  Steps  2 and  3. 
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Figure  5 2:  Plots  of  rn^  and  versus  x j," * are  shown,  where  x^l)  = 800  is  fixed 
and  77(f)  = 250tanh(^Q). 


It  turns  out  that  a closed  form  expression  for  the  problem  in  Step  3 is  not  possible. 
An  alternative  to  Step  3 is: 

• Step  3a  Find  ®^n+0  such  that  E(n)(a:(n+0)  < 

The  problem  in  Step  3a  can  be  solved  by  defining  cc^n+1^  as 

x{n+1)  = argmin  $£>(a?)  , (5.12) 

x>0 

where 

^(x)  = -^)(x)+/5A(")(x)  (5.13) 

and  is  defined  in  (3.12).  Defining  the  iterates  according  (5.12)  and  (5.13)  insures 
that  E(n>(x(n+1))  < E<n>(a:<B>).  To  see  this  fact,  note  that  (*("+!))  < $£>( x <n+1>) 
because  ^(n)(x)  < 'I'(x)  for  all  x > 0.  Since  $ip}(x(n+1))  < <I>ip)(x(n))  by  (5.12),  it 
follows  that 


= E(n)(x(n))  . 


E(n)(£C(n+1))  < $J)(*("+1))  < $g)(*(w>) 


(5.14) 
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All  that  remains  now  is  to  solve  the  optimization  problem  in  (5.12).  Observe 


that  <I>cp)  can  be  written  as 


4>S>(x)  = -#<”>(*) 

I ( J 
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Qn)  = <r + 2/?x  X - 4nV<f , 
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and  Cf},  and  Fjn>  are  defined  in  (3.13),  (3.30),  and  (3.31),  respectively.  Since 

(n) 

$eP  is  de-coupled,  it  follows  that  the  solution  to  (5.12)  is  given  by 


An) 


(5.19) 

(5.20) 


a>+1)  = arg  min  {E^n)  log(xj)  + Fjn)x?  + ffjn)xj } , j = 1,  2, . . . , J . (5.21) 

Repeating  the  steps  used  to  derive  (3.37),  it  is  straightforward  to  see  that  the  solution 
to  the  optimization  problem  in  (5.21)  is 

(„+1,  -^nW<’12-8jt)j7’1 

j 4 

j 


, j = 1,2, ...,  J . 


(5.22) 


CHAPTER  6 

JOINT  EMISSION  MEAN  AND  PROBABILITY  MATRIX  ESTIMATION 

In  Chapters  3,  4,  and  5,  we  assumed  that  the  probability  matrix  V accounts  for 
errors  due  to  attenuation,  detector  inefficiency,  detector  penetration,  non-collinearity, 
and  scatter.  However,  we  now  assume  that  an  I x J corrected  matrix  Vc  is  avail- 
able that  accounts  for  errors  due  to  attenuation,  detector  inefficiency,  and  detector 
penetration.  And,  we  develop  a method  for  correcting  errors  caused  by  scatter  and 
non-collinearity.  Before  presenting  the  proposed  method,  we  briefly  review  standard 
approaches  for  obtaining  Vc  in  practice. 

The  ( i,j)th  element  of  Vc , R7 , denotes  the  probability  that  an  annihilation  in  the 
jth  voxel  leads  to  a photon  pair  being  recorded  by  the  ith  detector  pair  when  there  are 
no  errors  due  to  scatter  and  non-collinearity.  In  practice,  one  can  reliably  estimate 
the  probability  matrix  Vc  by  using  the  detector  penetration  correction  method  in  [7] 
with  an  attenuation  correction  method  [9]  and  detector  inefficiency  correction  method 
in  [15].  However,  other  correction  methods  for  detector  penetration,  attenuation,  and 
detector  inefficiency  could  be  used  in  conjunction. 

To  address  attenuation  errors,  two  scans,  known  as  a transmission  scan  and 
blank  scan,  are  taken.  During  a transmission  scan,  1 — 3 rotating-rods  filled  with 
positron-emitting  isotopes  rotate  outside  the  subject  and  the  transmission  data  are 
the  coincidence  sums.  A blank  scan  is  taken  in  the  same  way  as  the  transmission  scan 
except  that  there  is  no  subject  inside  the  scanner.  The  coincidence  sums  during  the 
blank  scan  form  the  blank  scan  data.  Given  the  transmission  and  blank  scan  data,  an 
estimate  of  the  attenuation  map  can  be  generated.  Using  the  estimated  attenuation 
map,  attenuation  correction  factors  are  computed  and  used  for  attenuation  correction. 
Numerous  attenuation  estimation  methods  have  been  proposed  [8  10]. 
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Figure  6 1:  A simple  example  to  illustrate  the  geometry  of  PET  image  reconstruction 
with  three  voxels  (t^,  v2 , and  v3)  and  three  detector  pairs  ((ai,bi), (a2,b2),  and  (03,63)). 
The  dashed  lines  define  the  tubes  of  the  detector  pairs. 

To  address  errors  due  to  detector  inefficiency,  a blank  scan  of  extremely  long 
duration  is  performed.  From  the  blank  scan,  the  relative  efficiency  of  the  detector 
pairs  can  be  estimated.  Recall  that  the  efficiency  of  a detector  pair  is  defined  to  be 
the  probability  that  a coincidence  is  recorded  when  a photon  pair  is  incident  on  the 
detectors.  Consider  a set  of  detector  pairs  where  each  detector  pair  has  same  spatial 
extent.  One  simple  way  to  estimate  the  efficiency  of  a detector  pair  in  the  set  of 
detector  pairs  is  to  define  its  efficiency  to  be  the  ratio  of  the  number  of  photon  pairs 
recorded  by  the  detector  pair  and  the  mean  number  of  photon  pairs  recorded  by  a 
detector  pair  in  the  set  of  detector  pairs.  Once,  the  estimates  of  efficiencies  for  all 
detector  pairs  are  available,  they  can  be  incorporated  into  the  probability  matrix.  To 
address  detector  inefficiency,  more  complicated  correction  methods  such  as  [13  15] 
have  been  proposed. 

As  discussed  in  Section  1.3,  two  photons  generated  by  an  annihilation  usually 
do  not  propagate  in  exactly  opposite  directions,  which  is  called  non-collinearity  of 
line-of-response  [5].  As  illustrated  in  Figure  15,  it  is  possible  that  an  annihilation 
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is  recorded  by  detector  pair  i\  given  that  the  annihilation  would  have  been  recorded 
by  detector  pair  i2  if  both  photons  propagate  in  exactly  opposite  directions,  where 
h 7^  *2- 

Since  scatter  depends  on  the  activity  and  attenuation  within  the  subject  and 
the  scanner  design,  it  is  not  straightforward  to  correct  for  errors  due  to  scatter. 
Photons  lose  energy  when  they  are  under  gone  Compton  scattering.  Thus,  the  energy 
of  detected  photons  may  be  used  to  discriminate  between  unscattered  photons  and 
scattered  photons  [11,  pp.  65-69].  Scatter  correction  methods  that  are  based  on 
multiple  (two  or  more)  energy  windows  have  a limitation  because  detectors  have 
finite  energy  resolution  [46,47].  As  discussed  in  Chapter  2,  state-of-the-art  scatter 
correction  methods,  such  as  the  method  in  [52]  where  the  scatter  distribution  is 
calculated  using  an  analytical  equation,  are  not  practical  because  of  their  extremely 
high  computational  cost.  Consequently,  a simple  scatter  correction  method  would 
be  beneficial  to  the  PET  community.  In  this  chapter,  we  propose  a method  that 
estimates  the  probability  matrix  in  such  a way  that  errors  due  to  scatter  and  non- 
collinearity  are  addressed. 

In  Section  6.1,  we  first  propose  a model  for  emission  data  that  accounts  for  errors 
due  to  scatter  and  non-collinearity.  Then,  in  Section  6.2,  we  present  a method  where 
the  unknown  emission  mean  vector  and  an  unknown  matrix  in  the  proposed  model 
are  jointly  estimated.  Finally,  we  propose  an  algorithm,  referred  to  as  probability  cor- 
rection in  projection  space  (PCiPS)  algorithm,  that  estimates  the  unknown  emission 
mean  vector  and  the  unknown  matrix  in  the  proposed  model. 

6.1  Scatter  Matrix  Model 

Consider  Figure  6 1 in  which  a simplified  PET  scanner  consisting  of  three  de- 
tector pairs  is  depicted.  For  this  discussion,  we  will  focus  on  the  voxels  iq,  v2,  and 
v$  shown  in  Figure  6 1.  Let  the  detector  pairs  (ai,6i),  (a2,b2),  and  ( a3,b3 ) be  the 
first,  second,  and  third  detector  pairs,  respectively.  For  the  scanner  in  Figure  6 1,  we 
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assume  that 


0.04  0 0 


Vc  = 


0 0.04  0 


0.03  0.03  0.02 


(6.1) 


Suppose  during  a scan  there  are  no  errors  due  to  scatter  and  non-collinearity.  Then, 
the  mean  number  of  photon  pairs  recorded  by  the  detector  pair  (ai,6i)  is  Vhxi, 
where  Xi  is  the  mean  number  of  positrons  emitted  from  voxel  V\ . The  mean  num- 
ber of  photon  pairs  recorded  by  the  detector  pairs  (02,62)  and  (03,63)  are  V22x2  and 
(P31X1  + 'P'!2x'2  + P33X3),  respectively,  where  x2  and  x3  are  the  mean  number  of 
positrons  emitted  from  voxels  V2  and  v3 . respectively.  As  discussed  in  Section  1.3,  a 
photon’s  original  flight  path  is  altered  when  it  undergoes  Compton  scattering.  Conse- 
quently, an  annihilation  in  v\  that  would  have  been  recorded  by  detector  pair  (01,61) 
if  both  photons  had  not  undergone  Compton  scattering  may  instead  be  recorded  by 
detector  pair  ( a2,b2 ) when  scatter  occurs.  When  we  consider  non-collinearity,  it  is 
also  possible  that  an  annihilation  in  v\  is  recorded  by  detector  pair  {a2lb2)  given  that 
the  annihilation  would  have  been  recorded  by  detector  pair  (01,61)  if  both  photons 
propagate  in  exactly  opposite  directions. 

Let  denotes  the  conditional  probability  that  an  annihilation  is  recorded  by 
detector  pair  i\  given  that  the  annihilation  would  have  been  recorded  by  detector 
pair  i2  if  both  photons  produced  by  the  annihilation  had  not  undergone  Compton 
scattering  and  both  photons  propagate  in  exactly  opposite  directions,  where  i\  = 
1,2,...,/  and  *2  = 1,2,...,/.  It  is  through  these  unknown  probabilities  {/C?1,2} 
that  we  model  scatter  and  non-collinearity.  Now,  accounting  for  scatter  and  non- 
collinearity,  the  mean  number  of  photon  pairs  recorded  by  detector  pair  (ai,6i)  is 
{ICnVcnXi  + ICi2'P22X2  + ICi3(Vl1Xi+Vc32X2+Vl3X3)}.  Moreover,  the  mean  number  of 
photon  pairs  recorded  by  detector  pairs  (02,62)  and  (03,63)  are  {K,2\V\lX\+K22P22x2+ 
£23(^31^1 +^32^2+’£,33£3)}  and  {fcz\V\1Xx+)C32V22X2  +/C33 (V3lXi  +V32X2+V33X3)}, 
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respectively.  In  matrix  notation,  the  mean  of  the  data  E[D]  can  be  expressed  as 


1 

to 

CO 

1 

0.04  0 0 

xx 

E\D]  = 

/C-21  K 22  1C  23 

0 0.04  0 

X2 

^31  ^32  ^33 

0.03  0.03  0.02 

X‘A 

= /CVKx  , 


(6.2) 

(6.3) 


where  x = [xi  X2  x$  and  the  I x I matrix  1C  denotes  the  first  matrix  on  the  right- 
hand  side  of  (6.2).  We  will  refer  to  1C  as  the  scatter  matrix.  Since  the  emission  means 
can  be  expressed  as  lCVcx  in  this  model,  we  define  the  “true”  probability  matrix  as 


Rtruc  A KVC  ' (6.4) 

where  the  ( i,j)th  element  of  'Ptruc,  'P[™c , denotes  the  probability  that  an  annihilation 
in  the  jth  voxel  leads  to  a photon  pair  being  recorded  by  the  ith  detector  pair.  To 
our  knowledge,  the  scatter  matrix  model  we  propose  is  not  available  in  literatures. 
However,  similar  factorization  of  'Ptruc  in  (6.4)  can  be  found  in  [28,53,64,65]. 

With  the  definition  of  'Ptruc  in  (6.4),  the  mean  number  of  photon  pairs  recorded 
by  the  ith  detector  pair  can  be  expressed  as 

<«-s) 

j=l  j= 1 i2~l 


1 J 

= E E <6-6) 

*2=1  j= 1 

/ 

= Y,K,h[P‘x}l2.  (6.7) 

*2=1 


In  other  words,  the  mean  number  of  photon  pairs  recorded  by  the  ith  detector  pair 
is  a weighted  sum  of  the  mean  number  of  photon  pairs  recorded  by  all  detector  pairs 
when  there  are  no  errors  due  to  scatter  and  non-collinearity.  Regarding  the  matrix  1C, 
two  constraints  are  necessary.  Since  1C  is  a probability  matrix,  it  must  be  true  that 
0 < lClll2  < 1 for  all  i\  and  i 2.  Recall  that  /C(1,2  is  the  conditional  probability  that 
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an  annihilation  is  recorded  by  detector  pair  i\  given  that  the  annihilation  would  have 
been  recorded  by  detector  pair  i2  if  both  photons  produced  by  the  annihilation  had 
not  undergone  Compton  scattering  and  both  photons  propagate  in  exactly  opposite 
directions.  Since  a photon  pair  that  is  recorded  by  detector  pair  i\  can  not  be  recorded 
by  the  other  detector  pairs,  we  require  that 

/ 

Y lChi2  < 1 for  all  i2  . (6.8) 

*i=i 

6.2  Joint  Minimum  Kullback-Leibler  distance  Method 

We  now  consider  the  Poisson  model  by  Shepp  and  Vardi  [16].  In  their  model,  the 
emission  data  d is  an  observation  of  a random  vector  D that  is  Poisson  distributed 
with  mean  ( VtIucx  + p),  where  x is  the  unknown  emission  mean  vector  and  p is  the 
known  mean  accidental  coincidence  rate. 

Since  d is  the  ML  estimate  of  the  unknown  mean  (Ptiacx  + p)  1 , it  can  be  said 
that  d « (VtIucx  + p).  Using  the  definition  in  (6.4),  we  can  state  that 

d « ICVcx  + p . (6-9) 

It  is  important  to  point  out  that  there  are  two  unknowns  in  (6.9):  the  scatter  matrix 
/C  and  emission  mean  vector  x.  Thus,  given  the  emission  data  d,  mean  accidental 
coincidence  rate  p,  and  probability  matrix  Vc,  the  problem  of  interest  is  to  estimate 
the  emission  mean  x and  scatter  matrix  K.  We  define  the  estimate  (x,IC)  to  be  a 


1 If  d is  an  observation  of  a random  vector  D that  is  Poisson  distributed  with 
unknown  mean  A,  then  the  likelihood  function  for  d is  Pr{D  = d|A}  = ^e_A.  Since 
d maximizes  the  log  likelihood  function  log  Pr{D  = d|A},  d is  the  ML  estimate  of  A. 
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minimizer  of  the  Kullback-Leibler  (KL)  distance  [66]  between  d and  ( KLVcx  + p): 

(x,  KL)  = arg  min  KL(d.  ICV'  x + p)  subject  to 

( QC . /C ) 

/ 

x > 0,  /C  > 0,  and  KLni2  < 1 for  all  i2  , (6.10) 

u=i 

where  the  KL  distance  between  a and  b is  defined  by 

KL(a,  b)  = ^ ^aj  log  ^ - a,  + b^j  (6.11) 

and  KL  > 0 means  that  /Qp2  > 0 for  all  i\  and  i2-  The  definition  of  the  estimate  of 
(x,  KL)  is  motivated  in  part  by  the  fact  that  in  [26],  Byrne  derived  Shepp  and  Vardi’s 
MLEM  algorithm  [16]  by  minimizing  the  KL  distance  between  the  emission  data  d 
and  Vtruex  using  the  alternating  projection  algorithm  by  Csiszar  and  Tusnady  [67], 
where  'Ptrue  is  assumed  to  be  known.  It  should  be  mentioned  that  Byrne  derived  the 
MLEM  algorithm  when  p = 0. 

6.3  Probability  Correction  in  Projection  Space  (PCiPS)  Algorithm 

Since  it  is  difficult  to  solve  the  problem  in  (6.10),  we  propose  an  alternating 
minimization  algorithm  where,  in  a repetitive  fashion,  one  of  the  two  unknowns  (i.e., 
KL  and  x)  is  estimated  while  the  other  is  fixed.  Suppose  an  initial  estimate  for  the 
scatter  matrix  KL  (e.g.,  an  I x I identity  matrix),  denoted  by  KL^°\  and  an  initial 
estimate  x ^ > 0 are  available.  Then,  for  n = 0, 1, 2, . . .,  the  steps  of  the  proposed 
algorithm  are 

• Step  1 Get  the  current  estimate  of  emission  mean  vector  a^n+T  using  KSn\  the 
current  estimate  of  the  scatter  matrix  KL: 

x{n+1)  = arg  min  KL(d,  IC{n)Vcx  + p)  . 
x>0 


(6.12) 
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• Step  2 Get  the  current  estimate  of  the  scatter  matrix  AGn+1*  using  x^n+1\  the 
current  estimate  of  the  emission  mean  vector  x: 


/C(n+1)  = argmin  KL(d.  KVcx{n+l)  + p) 

/c>  0 

i 

subject  to  ICixi2  < 1 for  all  ?2  • (6.13) 

*i=i 

• Step  3 Repeat  the  steps  above  until  some  chosen  stopping  criterion  is  met. 
Note  that  the  problem  in  Step  1 can  be  solved  using  the  MLEM  algorithm  [16,26]. 
The  reason  is  because  Byrne  showed  that  Shepp  and  Vardi’s  MLEM  algorithm  [16] 
minimizes  the  KL  distance  between  the  emission  data  d and  VUm'x.  where  'Ptruc  is 
known.  Specihcally,  given  an  initial  guess  xtn  ’°)  > 0.  the  iteration  for  obtaining  ah”+i) 
is 


(n,m+l) 


,(n,m) 

i 


' [/C<n)Pc]0-xf,m) 

[ICWV'xM  + p\i 


j = 1,2,.. . , J , 


(6.14) 


for  m = 0, 1, 2, ... , M\.  The  current  estimate  of  x is  defined  to  be  x ("+1)  = x(«,ati+i) ^ 
the  M[h  iterate  of  (6.14). 

One  of  the  issues  surrounding  the  constrained  minimization  problem  in  (6.13)  is 
that  it  is  impossible  to  estimate  the  scatter  matrix  K-  when  all  of  the  elements  in  the 
matrix  are  assumed  to  be  unknown.  The  reason  is  because  there  would  be  too  many 
unknowns,  which  would  result  in  the  problem  being  under-determined  (note:  the  di- 
mension of  /C  is  / x J,  while  there  are  only  I data  points).  To  reduce  the  number  of 
unknowns  in  AC,  we  assume  that  = 0 if  the  detector  pairs  i\  and  are  not  in  the 
same  projection  (see  Figure  1 3 for  the  definition  of  a projection).  This  assumption 
means  that  an  annihilation  that  would  have  been  recorded  by  a detector  pair  within  a 
certain  projection  if  both  photons  had  not  undergone  Compton  scattering  cannot  be 
recorded  by  a detector  pair  within  some  other  projection.  Under  the  stated  assump- 
tion, the  number  of  unknown  parameters  in  the  matrix  K.  is  dramatically  reduced. 
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Moreover,  /C  is  a block  diagonal  matrix  with  T x T sub-matrices  along  its  diagonal 


/Ci  0 0 0 

0 /C2  0 0 

0 0 0 
0 0 0 K.s 


(6.15) 


where  T is  the  number  of  detector  pairs  within  each  projection  (e.g.,  160),  S is  the 
number  of  projection  angles  (e.g.,  192),  and  it  is  assumed  that  emission  counts  of 
the  detector  pairs  within  a projection  are  placed  as  a “chunk”  in  the  emission  data 
d (i.e. , d = [(1st  projection),  ( 2nd  projection),  . . . , ( Ith  projection)]'). 

Since  K,  is  a block  diagonal  matrix,  the  minimization  problem  in  (6.13)  can  be 
broken  into  S sub-problems.  Consider  the  sinogram  of  d , denoted  by  jV,  which  is  a 
T x S matrix.  The  sth  column  of  the  sinogram  y is  the  projection  that  corresponds 
to  a certain  projection  angle.  Figure  6 2 (a)  shows  an  example  of  a sinogram.  In  the 
figure,  the  sinogram  of  the  14  minute  emission  data  for  plane  21  is  shown.  The  /xl 
vector  Vcx('n+1^  is  known  in  the  PET  community  as  the  forward  projection  of  x^n+l\ 
We  define  a matrix  W(-n+1'>  where  the  first  column  is  the  first  S elements  of  'Pcab"+b 
and  the  second  column  contains  the  (S  + l)th  to  (2 S)th  elements  of  Vcx^n+1^  and  so 
on.  In  other  words,  the  ordering  of  the  detector  pairs  associated  with  Wk"+b  and 
y match.  Figure  6 2 (b)  shows  an  example  of  VV^"+1\  where  a^n+b  was  generated 
by  running  the  MLEM  algorithm  for  1000  iterations  on  the  emission  data  in  Figure 
6 2 (a).  Under  the  assumption  that  — 0 if  the  detector  pairs  i\  and  are  not 
in  the  same  projection,  the  minimization  problem  in  (6.13)  can  be  expressed  as:  for 
s = 1,2,. ..,5, 


£(n+i)  ^ arg  mia  KL {ya,Kaw 

K,>V 


(n+1) 

s 


+ Zs) 


T 

subject  to  Y1K.U  < 1 for  i2  = 1, 2, . . . , T , 

*i=i 


(6.16) 
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(a)  Emission  Data 


(b)  Forward  Projection 


Figure  6 2:  Sinograms:  (a)  14  minute  emission  data  for  plane  21  (i.e.,  y)  and  (b) 
forward  projection  of  at(n+1)  (i.e.,  W(n+1)  = Vcx(-n+ x)),  where  at(u+1)  is  the  1000t/l 
MLEM  iterate  using  the  emission  data  in  (a).  Note  that  the  images  were  adjusted 
with  their  own  dynamic  range. 
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where  zs  = [/0{(s-iyr+i}  P{(s-i)r+2}  . . • P{st}]',  £s  is  the  sth  sub-matrix  of  /C,  and  ys 
and  w{'1 1 1J  are  the  sth  column  of  y and  W(n+1\  respectively. 

Since  there  are  T x T unknowns  in  each  minimization  problem  in  (6.16),  there 
would  be  still  too  many  unknowns.  To  reduce  the  number  of  unknowns,  we  assume 
that,  for  all  s,  Ks  is  a Toeplitz  matrix.  An  example  of  a Toeplitz  matrix  is 
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The  assumption  that  /Cs  is  a Toeplitz  matrix  implies  that  there  are  at  most  T un- 
knowns in  each  sub-matrix  K.s.  Moreover,  the  assumption  means  that  a single  kernel 
can  account  for  scatter  within  each  projection. 

The  assumption  that  JCS  is  a Toeplitz  matrix  can  be  justified  for  regions  with 
approximately  uniform  attenuation,  such  as  the  brain.  Consider  Figure  6 3 in  which  a 
simplified  PET  scanner  consisting  of  four  detector  pairs  is  depicted.  In  the  figure,  the 
dotted  circle  defines  a region  with  uniform  attenuation.  We  refer  to  the  detector  pairs 
(ai,6i),  (a2,b2),  (a3,b3),  and  (a4,b4)  as  the  first,  second,  third,  and  fourth  detector 
pairs,  respectively.  Note  that  the  geometry  of  the  first  and  second  detector  pairs  and 
the  geometry  of  the  second  and  third  detector  pairs  are  approximately  same.  Because 
of  the  approximately  uniform  attenuation  of  the  subject  and  geometric  similarity  of 
the  detector  pairs,  it  can  be  said  that  the  conditional  probabilities  K2\  and  K32  are 
approximately  same.  Using  this  rationale,  for  a projection  of  a PET  scanner,  it  can 
be  assumed  that  ~ JCi3i2,  when  (i2  — i\)  = [i3  — *2).  Thus,  we  can  construct  KLS 
that  can  be  approximated  as  a Toeplitz  matrix. 
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Figure  6 3:  Geometry  of  a simplified  PET  image  reconstruction  problem:  three 
voxels  (ui,  t>2,  and  v3)  and  four  detector  pairs  ((ai,&i), (02,62),  (03,63),  and  (a4,64)). 
The  dashed  lines  dehne  the  tubes  of  the  detector  pairs.  The  dotted  circle  defines  a 
region  with  uniform  attenuation. 


Now,  consider  the  following  functions:  for  an  integer  t. 


Va{t)  = 


[ Vs]t , t = l,2,...,T 
0,  otherwise 


(6.17) 


w«n+1,(t) 


A 


[mi”+1)]t,  t = 1,2,...,T 
0,  otherwise 


(6.18) 


[zs]t,  t — 1, 2, . . . , T 
0,  otherwise 


(6.19) 


Under  the  assumption  that  Ks  is  a Toeplitz  matrix,  ys(t)  is  approximately  equal  to  the 
convolution  of  win+1\t)  and  an  unknown  nonnegative  function,  denoted  by  k[s>T)(t), 
that  depends  on  the  sth  projection  angle.  Thus,  for  t = 1,  2, . . . , T, 

T — 1 

Vs{t)  ~ {k(s,r)  + zs(t)  = kM(u)wln+l) (t  — u)  + za(t)  , (6.20) 

u=— r+l 
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where  k(s,T)(t ) € [0, 1]  for  an  integer  t and  /qSjT)(i)  = 0 for  |t|  > r.  The  parameter  r 
is  defined  by  the  user,  but  must  satisfy  the  constraint  r < [(T  + l)/2j . Observing 


would  have  been  recorded  by  detector  pair  ( t — u)  if  both  photons  produced  by 
the  annihilation  had  not  undergone  Compton  scattering  and  they  had  propagated 
in  exactly  opposite  directions,  but  instead  are  recorded  by  detector  pair  t.  The 
parameter  r restricts  the  number  of  detector  pairs  to  be  convolved.  In  principle, 
the  parameter  r is  to  be  chosen  in  such  a way  that  r is  proportional  to  the  mean 
number  of  scatter  events.  Since  the  mean  number  of  scatter  events  is  proportional  to 
the  attenuation  of  the  material,  it  is  possible  that  attenuation  map  could  be  used  to 
determine  the  parameter  r. 

The  convolution  in  (6.20)  can  be  expressed  in  a matrix  notation:  for 


(6.20),  the  product  k(S!T)(u)win+1\t  — u ) equals  the  proportion  of  photon  pairs  that 


(h.,T)  * +1)m  = [B^kis,r)]t  , t = 1,2, ...  ,r,  (6.21) 


where  fc(S)T)  = [^(s,t)(~t’  + 1)  fc(s,r)(— T + 2)  • • • k(s^T){r  — 1)]'  and  is  a known 

T x (2 r — 1)  matrix  that  is  of  the  form 


0 


0 


0 


0 


Thus,  ys  can  be  approximated  as 


y 


(6.22) 
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With  (6.22),  the  minimization  problem  in  (6.16)  is  now 


iSn+1') 


A 

= arg  mm 
k{a,  t)>0 


KL  (ys,Bln+Vk{S}T)  + zs) 


2r— 1 

subject  to  [fc(s>r)]t  < 1 • 

t=l 


(6.23) 


The  constrained  optimization  problem  in  (6.23)  is  difficult  to  solve  because  of  the 
constraint  < 1-  Consequently,  we  first  solve  the  following  optimization 

problem: 


* ", t)  = arS  , min  KL(?/S>  #i”+1)*W)  + za)  . 
fc(,,r)>0 


(6.24) 


Then,  to  get  kj™T)  , we  normalize  so  that  the  sum  of  its  elements  equals  one: 

t = l,2,...,(2r-  1)  . (6.25) 


qs,T)  > 

a ,«-(n+1) 

F(s,r)  I* 


(n+1)-]  A [^(s.r)  it 

yitr-lrTfr+lh 
2-jt= 1 lK(s,r)  Jt 


The  minimization  problem  in  (6.24)  can  be  solved  by  the  MLEM  algorithm  [16,26] 
because  it  has  the  same  form  as  the  optimization  problem  in  Step  1.  Specifically, 
given  an  initial  estimate  > 0.  the  iteration  is  as  follows:  for  m = 0, 1, . . . M2, 

dn’m)l  T r«(n+l)i 


(n,ra+l)i 


l *W) 


. , j = l,2,...,(2r-l).  (6.26) 


Er„[sS”+1)]«1rr[8S"+l>fc!;;”,  + ^1. 


We  define  k[™l^  = and  normalize  using  (6.25)  to  get  fc|"+1) 


“(s.t) 


I'(s.t)  ■ 
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Given  we  determine  ICi"  1 " by  using  the  following  equation: 


fc(n+ 1)  _ 


\k{n+1)  1 

JT 

LK(s,r)  JT_i 

Jt+1 

LK(s,t)  Jt 

\k{n+}]} 

[K(s,r)  J2r_i 

>2t-2 

0 

\k{n+1)] 

• \k(n+})] 

1K(s,t)  i2r_1 

\S>T)  T+l 

0 

0 

0 

0 

0 

0 

0 


[iu(n+1)l 
[K(s,r)  lT 


. (6.27) 


Finally,  is  defined  as  a block  diagonal  matrix  with  {/Ci"+1^}  along  its  diagonal: 


]Q(n+l) 


IC[n+1)  0 0 0 

o /c{2n+1)  o o 


(6.28) 


0 0 0 

o oo  /d”+1) 

Repeating  Steps  1 and  2 generate  the  estimates  of  K and  x.  Once  the  estimate 
of  K is  available,  denoted  by  /C,  a regularized  image  reconstruction  algorithm,  such 
as  the  PML,  APML,  and  QEP  algorithm,  can  be  used  to  estimate  the  emission  mean 
vector  x.  More  specifically,  we  first  let  our  estimate  for  'plvnc  be  V = 1CVC.  Then, 
the  probability  matrix  V is  used  in  the  image  reconstruction  algorithm  of  choice. 

In  summary,  the  Steps  of  the  PCiPS  algorithm  are:  for  n = 0, 1,  2, . . . 

• Step  1 Get  an  initial  estimate  for  the  emission  mean  vector  x^  > 0 and  an 
initial  estimate  for  the  scatter  matrix  K, ^ 

• Step  2 Get  the  current  estimate  of  the  emission  mean  vector  x<'n+1'>  using  K^n\ 
which  is  the  current  estimate  of  the  scatter  matrix  fC: 

^(n+1)  _ arg  mjn  KL (d,  K^Vcx  + p)  . (6.29) 

*>o 

• Step  3 For  s — 1,  2, . . . , S,  get  using  (6.26). 
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• Step  4 For  s = 1,2 , . . . ,S,  normalize  using  (6.25)  to  get 

• Step  5 For  s = 1,2 , . . . , 5,  get  JC i”+1)  using  (6.27). 

• Step  6 Get  /G"+1)  using  (6.28). 

• Step  7 Repeat  Steps  2 through  6 for  a chosen  number,  M,  of  iterations. 

• Step  8 Define  P" ' = /c(M+1)pc  and  use  f>  jn  the  APML  algorithm  or  some 

other  algorithm  of  choice. 


CHAPTER  7 

SIMULATIONS  AND  EXPERIMENTAL  STUDY 
To  evaluate  the  algorithms  in  Chapters  3,  4,  and  5,  and  the  method  in  Chapter  6, 
we  applied  them  to  real  thorax  phantom  data  and  compared  them  quantitatively  and 
qualitatively  to  certain  existing  algorithms.  Also,  in  Section  7.1  simulation  studies 
with  computer-generated  synthetic  data  are  presented  for  the  PML  and  QEP  algo- 
rithms in  Chapters  3 and  5,  respectively.  It  should  be  noted  that  simulation  results 
with  the  synthetic  data  have  limitations  because  they  are  generated  under  the  as- 
sumption that  the  system  model  for  emission  data  in  Section  1.4  is  exactly  correct. 
However,  the  system  model  is  not  perfect  due  to  errors  discussed  in  Section  1.3. 

Thorax  phantom  data  was  obtained  from  the  PET  laboratory  at  the  Emory 
University  School  of  Medicine.  The  phantom  was  filled  with  2-[18F]fluoro-2-deoxy- 
d-glucose  ([18F]FDG)  and  scanned  using  a Siemens-CTI  ECAT  EXACT  [model  921] 
scanner  in  slice-collimated  mode  (i.e.,  septa-extended  mode).  Thirty  independent 
data  sets  were  generated  from  multiple  scans  of  duration  7 minutes.  Fifteen  re- 
alizations of  14  min  data  were  generated  by  adding  non-overlapping  two  7 min 
data  sets.  The  [18F]FDG  concentration  for  the  heart  wall,  heart  cavity,  liver,  three 
tumors,  and  thorax  cavity  of  the  thorax  phantom  by  Data  Spectrum  Inc.,  were 
0.72/xCi/ml,  0.23/rCi/ml,  0.72/iCi/ml,  2.01//Ci/ml,  and  0.24/rCi/ml,  respectively.  The 
lungs,  which  contained  styrofoam  beads,  were  filled  with  a 0.25/iCi/ml  solution  of 
[18F]FDG.  The  concentrations  were  chosen  to  mimic  those  observed  in  whole-body 
scans.  The  tumors  were  of  size  1 cm,  1.5  cm,  and  2 cm.  The  sinogram  consists  of  160 
radial  bins  and  192  angles.  The  physical  dimensions  of  the  image  space  is  43.9  x 43.9 
cm2  and  the  reconstructed  images  contain  128  x 128  voxels  (voxel  size  is  3.43  x 3.43 
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mm2).  Two  planes  (10  and  21)  were  considered  in  the  experiments.  Plane  10  con- 
tains activity  due  to  the  heart,  lungs,  spine,  and  background,  while  plane  21  contains 
activity  due  to  the  heart,  two  tumors  (1.5  cm  and  2.0  cm),  and  background.  The 
total  number  of  prompts  for  planes  10  and  21  was  397, 000  and  340, 000,  respectively, 
for  14  minute  data.  The  randoms  makeup  about  10%  and  12%  of  the  data  for  planes 
10  and  21,  respectively. 

The  probability  matrix  V was  computed  using  the  angle-of-view  method  [16] 
with  corrections  for  errors  due  to  attenuation  and  detector  inefficiency.  To  get  the 
attenuation  correction  factors,  post-injected  transmission  scan  data  was  collected  for 
three  minutes  and  the  attenuation  correction  method  by  Anderson  et  al.  [9]  was 
employed.  A normalization  file  was  used  to  correct  for  detector  inefficiency.  Finally, 
the  randoms  were  used  as  noise  free  estimates  of  the  mean  numbers  of  accidental 
coincidences. 

For  all  of  the  experiments  and  simulations,  we  used  a uniform  initial  estimate 
(all  voxels  equal  J),  the  eight  nearest  neighbors  of  the  jth  voxel  were  used  for 

Nj,  and  the  weights,  {iHjk}-  are  one  for  horizontal  and  vertical  nearest  neighbors  and 
l/\/2  for  diagonal  nearest  neighbors. 

In  Section  7.1,  we  applied  the  PML  and  QEP  algorithms  to  real  thorax  phantom 
data  and  computer-generated  synthetic  data,  and  compared  them  quantitatively  and 
qualitatively.  Then,  performance  of  the  APML  algorithm  will  be  evaluated  in  Section 
7.2.  Finally,  experimental  results  with  the  probability  estimation  method  in  Chapter 
6 will  be  presented  in  Section  7.3. 

7.1  Regularized  Image  Reconstruction  Algorithms 

In  this  section,  we  compare  the  PML  and  QEP  algorithms,  quantitatively  and 
qualitatively,  to  the  MLEM  algorithm  and  a penalized  weighted  least-squares  (PWLS) 
algorithm  [42],  Two  ad-hoc  forms  of  regularization  include  the  post-filtering  of  MLEM 
estimates  and  early  termination  of  the  MLEM  algorithm  (usually  quite  far  from  the 
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MLEM  estimate).  Given  their  simplicity,  we  also  compared  the  post-filtering  (MLEM- 
F)  and  early-stopping  (MLEM-S)  strategies  to  the  proposed  algorithms. 

Quantitative  comparisons  were  made  using  contrast  as  a figure-of-merit: 

Contrast  4 ' ~ Mfll  , (7.1) 

Mb 

where  M bo i and  Mb  denote  the  mean  of  a chosen  region-of-interest  (ROI)  and  back- 
ground, respectively.  We  define  another  figure-of-merit  that  quantifies  the  distin- 
guishability  of  the  two  tumors: 

Distinguishability  = ^ , (7.2) 

\lVlrp  — IVIb  I 

where  Mr  and  Mj  denote  the  mean  activity  of  the  two  tumor  regions  and  intermediate 
region  between  the  two  tumors,  respectively.  If  two  tumors  overlap  each  other,  then 
Mj  = MT  and  the  distinguishability  will  be  zero.  On  the  other  hand,  if  intermediate 
region  between  the  two  tumors  has  the  same  mean  as  the  background,  then  the 
distinguishability  will  be  one. 

For  image  comparisons,  converged  images  were  used  for  the  MLEM,  PML,  and 
PWLS  algorithms.  The  QEP  images  result  by  running  the  QEP  algorithm  for  the 
same  number  of  iterations  as  the  PML  algorithm.  This  was  necessary  because  the 
QEP  algorithm  does  not  have  a single  objective  function.  Recall  that  the  QEP 
algorithm  defines  a new  objective  function  to  be  minimized  at  each  iteration.  For  the 
PML,  QEP,  and  PWLS  images,  the  penalty  parameter,  (3,  was  chosen  in  such  a way 
that  the  standard  deviation  of  their  soft-tissue  (i.e. , background)  regions  were  equal. 
In  this  sense,  the  algorithms  are  “balanced”  with  respect  to  (3.  For  the  MLEM-S  and 
MLEM-F  images,  early  MLEM  iterates  and  filtered  converged  MLEM  images  were 
obtained  that  matched  the  standard  deviation  of  the  soft-tissue  regions  in  the  PML, 
QEP,  and  PWLS  images.  To  filter  the  MLEM  image,  we  used  5x5  Gaussian  Liters 
with  different  standard  deviation  values. 
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7.1.1  Synthetic  Data 

In  this  subsection,  we  present  simulation  results  for  software  phantoms.  Fifty 
realizations  were  used  in  the  simulation  study.  To  generate  emission  data,  a software 
phantom  was  forward  projected  (i.e.,  Vx)  using  the  V matrix,  where  it  was  assumed 
that  there  are  no  errors  except  accidental  coincidences.  Then,  for  each  bin,  the 
prompts  and  randoms  were  generated  using  pseudo-random  Poisson  variates  with 
mean  [Px\ * + p and  p,  respectively.  The  constant  p was  chosen  such  that  the  mean 
accidental  coincidence  rate  was  approximately  10%.  The  mean  number  of  prompts 
and  randoms  are  about  550, 000  and  50, 000,  respectively.  For  simulation  studies, 
dimensions  of  the  image  space  follow  the  corresponding  ones  of  the  real  phantom 
image  space.  The  total  number  of  intensities  within  a software  phantom  was  about 
500, 000. 

We  first  consider  a tumor  phantom.  Figure  7 1 shows  a software  phantom  that 
consists  of  two  tumors  (1.7  cm  and  2.4  cm  in  diameter)  in  a uniform  circular  back- 
ground with  a diameter  of  30.5  cm,  where  two  tumors  and  background  intensities 
are  7 x 74  and  74,  respectively  (tumor  contrast  is  6).  The  image  also  depicts  regions 
for  contrast  and  distinguishability  calculation.  The  intermediate  region  between  the 
two  tumors  consists  of  14  voxels  and  the  large  and  small  tumors  consist  of  45  and  21 
voxels,  respectively. 

For  the  tumor  phantom  in  Figure  7 1,  we  used  the  log-cosh  penalty  function 
A (t)  = log(cosh(|))  with  5 = 20  in  the  PML  and  QEP  algorithms,  while  the  quadratic 
penalty  function  was  used  in  the  PWLS  algorithm.  For  the  QEP  algorithm,  77(f)  = 
£tanh(|)  was  used  with  £ = 150  unless  noted.  The  parameters  5 and  £ were  chosen 
experimentally. 

Figure  7 2 is  a plot  of  the  images  obtained  by  the  MLEM,  MLEM-S,  MLEM-F, 
PML,  QEP,  and  PWLS  algorithms  when  the  tumor  phantom  in  Figure  7 1 was  used. 
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The  MLEM  image  is  the  500f/'  MLEM  iterate  and  200  iterations  were  used  to  recon- 
struct the  PML,  QEP,  and  PWLS  images.  For  the  PML,  QEP,  and  PWLS  images, 
/ 3 was  chosen  such  that  the  standard  deviation  of  the  background  was  approximately 
12.  The  MLEM-S  image  is  the  20th  MLEM  iterate.  To  get  the  MLEM-F  image, 
the  MLEM  image  in  Figure  7 2 (a)  was  filtered  once  using  a 5 x 5 Gaussian  filter 
with  standard  deviation  of  1.27  in  voxels.  As  stated,  all  of  the  images  in  Figure  7 2 
have  the  same  background  standard  deviation  except  for  the  MLEM  image.  The 
MLEM  image  in  Figure  7 2 (a)  is  considerably  noisy  (background  standard  deviation 
is  about  78)  compared  to  the  other  images.  Figures  7 2 (d)  and  (e)  illustrate  that 
the  PML  image  and  the  QEP  image  are  smooth  and,  at  the  same  time,  the  tumors  in 
the  images  are  resolvable  and  differ  greatly  from  the  background.  On  the  other  hand, 
Figures  7 2 (c)  and  (f)  demonstrate  that  the  images  generated  by  the  MLEM-F  and 
PWLS  algorithm  are  too  smooth,  especially  near  the  boundary  of  the  tumors.  Figure 
7 2 (b)  shows  that  edges  of  the  tumors  in  the  MLEM-S  image  are  not  as  clear  as 
the  ones  in  the  PML  and  QEP  images.  In  Figure  7 3,  the  images  in  Figure  7 2 are 
plotted  with  their  own  dynamic  range. 

For  the  images  in  Figure  7 2,  the  contrast  of  the  QEP  image  was  -1%,  12%, 
21%,  4%,  and  22%  higher  than  the  MLEM,  MLEM-S,  MLEM-F,  PML,  and  PWLS 
images,  respectively,  for  the  large  tumor.  The  increased  contrast  of  the  QEP  im- 
age for  the  small  tumor  with  respect  to  the  MLEM,  MLEM-S,  MLEM-F,  PML,  and 
PWLS  images  was  —4%,  16%,  31%,  5%,  and  34%,  respectively.  The  increased  tumor 
distinguishability  of  the  QEP  image  with  respect  to  the  MLEM,  MLEM-S,  MLEM- 
F,  PML,  and  PWLS  images  was  —5%,  16%,  34%,  5%,  and  40%,  respectively.  The 
MLEM  image  outperformed  the  QEP  image  in  contrast  and  distinguishability  com- 
parison. However,  as  can  be  seen  in  Figure  7 2 (a),  the  MLEM  image  is  extremely 


noisy. 
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Figures  7 4 (a)  and  (b)  are  line  plots  (the  row  is  shown  in  Figure  7 1)  of  the 
images  in  Figure  7 2.  For  the  row  under  consideration,  it  can  be  seen  from  the  line 
plots  that  the  PML  and  QEP  images  have  a higher  degree  of  contrast  than  the  other 
images,  except  for  the  MLEM  image.  And.  the  edges  in  the  QEP  image  are  sharper 
than  those  in  the  PML  image.  As  expected,  the  MLEM  image  is  excessively  noisy 
from  the  line  plot. 

Figures  7 5 (a)  and  (b)  are  plots  of  the  average  contrast  of  the  large  tumor  and 
small  tumor  versus  the  average  background  standard  deviation  using  fifty  realizations, 
respectively,  when  the  tumor  phantom  in  Figure  7 1 was  used.  Further,  a plot  of 
the  average  standard  deviation  of  the  large  tumor  versus  the  average  background 
standard  deviation  for  the  fifty  realizations  is  shown  in  Figure  7 5 (c).  Finally,  in 
Figure  7 5 (d),  a plot  of  the  average  distinguishability  of  two  tumors  versus  the 
average  background  standard  deviation  for  fifty  realizations  is  shown.  As  can  be  seen 
from  Figures  7 5 (a),  (b),  and  (d),  the  contrast  curves  and  the  distinguishability  curve 
of  the  QEP  algorithm  lie  above  the  curves  of  the  other  algorithms  for  comparable 
“background  noise”.  Thus,  for  a fixed  level  of  comparable  background  noise,  the 
QEP  images,  on  average,  have  the  greatest  contrast  and  distinguishability.  The 
average  standard  deviation  curves  of  the  PML  and  QEP  algorithms  in  Figure  7 5 (c) 
generally  lie  below  the  corresponding  curves  of  the  other  algorithms  for  reasonably 
small  background  noise. 

To  see  where  the  PML  and  QEP  algorithms  break  down  in  terms  of  contrast, 
we  performed  simulations  using  the  synthetic  tumor  phantom  in  Figure  7 1 with 
four  different  tumor  contrast  values  (tumor  contrast  equals  3,  1.5,  0.75,  and  0.5). 
For  the  PML  and  QEP  algorithms,  j3  — 1/16  and  1/32  were  used,  respectively,  and 
200  iterations  were  used.  For  the  QEP  algorithm,  £ = 150  was  used  for  tumor 
contrast  of  3,  whereas  £ — 80  was  used  for  the  other  tumor  contrast  values  (i.e., 
tumor  contrast  equals  1.5,  0.75,  and  0.5).  The  parameters  (3  and  £ were  chosen 
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experimentally.  Figures  7 6 (a)  and  (b)  are  the  PML  and  QEP  images,  respectively, 
obtained  by  using  the  phantom  in  Figure  7 1 with  tumor  contrast  of  3.  As  can  be 
seen  in  the  figures,  when  tumor  contrast  was  3,  the  tumors  in  the  PML  and  QEP 
images  are  clearly  resolvable.  Figures  7 6 (c)  and  (d)  are  the  PML  and  QEP  images, 
respectively,  obtained  by  using  the  phantom  in  Figure  7 1 with  tumor  contrast  of  1.5. 
From  the  figures,  the  tumors  in  the  PML  and  QEP  images  are  clear  enough  when 
tumor  contrast  was  1.5.  It  should  be  mentioned  that  the  images  in  Figure  7 6 have 
the  same  standard  deviation  of  the  background  that  is  approximately  10.  Figures 
7 7 (a)  and  (b)  are  the  PML  and  QEP  images,  respectively,  obtained  by  using  the 
phantom  in  Figure  7 1 with  tumor  contrast  of  0.75.  From  the  figures,  the  tumors  in 
the  PML  and  QEP  images  are  still  resolvable  when  tumor  contrast  was  0.75.  Consider 
the  PML  and  QEP  images  in  Figures  7 7 (c)  and  (d),  respectively,  that  are  obtained 
by  using  the  phantom  in  Figure  7 1 with  tumor  contrast  of  0.5.  The  tumors  in  the 
PML  and  QEP  images  are  hardly  resolvable  which  implies  that  the  PML  and  QEP 
algorithms  break  down  when  tumor  contrast  was  0.5.  The  images  in  Figure  7 7 have 
the  same  standard  deviation  of  the  background  that  is  approximately  9. 

Figures  7 8 (a),  (b),  (c),  and  (d)  are  plots  of  the  average  contrast  of  the  small 
tumor  versus  the  average  background  standard  deviation  using  fifty  realizations,  when 
tumor  contrast  of  the  phantom  in  Figure  7 1 were  3,  1.5,  0.75,  and  0.5,  respectively. 
As  can  be  seen  from  Figures  7 8 (a)  and  (b),  the  contrast  curves  of  the  QEP  algorithm 
he  above  the  curves  of  the  PML  algorithm  for  a fixed  background  noise  when  tumor 
contrast  equals  3 and  1.5.  For  tumor  contrast  of  0.75  and  0.5,  the  curves  of  the 
PML  and  QEP  algorithm  coincide  as  can  be  seen  in  Figures  7-8  (c)  and  (d)  which 
implies  that  the  PML  and  QEP  algorithms  perform  similar  to  each  other  when  tumor 
contrast  is  small  enough. 

To  see  where  the  PML  and  QEP  algorithms  break  down  in  terms  of  spacing 
between  two  tumors,  we  performed  simulations  using  four  different  synthetic  tumor 
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phantoms  where  each  phantom  has  different  tumor  spacing.  We  compared  the  PML 
and  QEP  images  both  visually  and  in  terms  of  distinguishability.  Figures  7 9 (a), 
(b),  and  (c)  show  software  phantoms  that  consist  of  two  tumors  (each  tumor  equals 
3x3  voxels).  For  the  phantoms  in  Figures  7 9 (a),  (b),  and  (c),  tumor  contrast  is 
3 and  the  spacing  between  two  tumors  is  2,  3,  and  4 voxels,  respectively.  Figure  7 9 
(d)  shows  a software  phantoms  that  consists  of  two  tumors  (each  tumor  equals  3x3 
voxels)  with  the  spacing  of  2 voxels,  where  tumor  contrast  is  6.  The  images  in  Figure 
7 9 also  depict  regions  for  distinguishability  calculation.  The  intermediate  regions 
between  the  two  tumors  consist  of  6,  8,  10,  and  6 voxels  for  the  phantoms  in  Figures 
7 9 (a),  (b),  (c)  and  (d),  respectively. 

Figure  7 10  is  a plot  of  the  PML  and  QEP  images  obtained  by  using  the  phan- 
toms in  Figures  7 9 (a)  and  (b).  Figure  7 11  is  a plot  of  the  PML  and  QEP  images 
obtained  by  using  the  phantoms  in  Figures  7 9 (c)  and  (d).  The  PML  and  QEP  im- 
ages in  Figures  7 10  and  7 11  were  from  200  iterations.  The  PML  and  QEP  images 
in  Figures  7 10  and  7 11  have  the  same  standard  deviation  of  the  background  that 
is  approximately  10  and  9,  respectively.  For  the  PML  and  QEP  images  in  Figures 
7 10  and  7 11,  (3  = 1/16  and  1/32  were  used,  respectively.  Figure  7 10  indicates 
that  the  PML  and  QEP  algorithms  generate  images  where  the  tumors  are  clearly 
separated  when  the  spacing  between  the  tumors  is  3 and  4 in  voxels.  As  can  be  seen 
in  Figures  7 11  (a)  and  (b),  the  PML  and  QEP  algorithms  were  not  able  to  resolve 
the  two  tumors  when  the  spacing  between  the  tumors  is  2 in  voxels.  However,  when 
the  tumor  contrast  was  increased,  the  PML  and  QEP  algorithms  worked  well  when 
the  spacing  between  the  tumors  is  2 in  voxels  as  shown  in  Figures  7 11  (c)  and  (d). 

Figure  7 12  is  a plot  of  the  average  distinguishability  of  two  tumors  versus  the 
average  background  standard  deviation  using  fifty  realizations,  when  the  tumor  phan- 
toms in  Figure  7 9 were  used.  As  can  be  seen  from  Figures  7 12  (a),  (b),  (c),  and 
(d),  the  distinguishability  curves  of  the  QEP  algorithm  lie  above  the  curves  of  the 


87 


Figure  7 1:  A software  tumor  phantom  is  shown.  Contrast  of  the  tumors  in  the 
phantom  is  6.  The  regions  surrounded  by  the  dotted  and  dashed  lines  define  the 
tumor  intermediate  region  (i.e.,  M/)  and  background  region,  respectively. 

PML  algorithm  for  a fixed  background  noise.  Thus,  for  a fixed  level  of  background 
noise,  the  QEP  images,  on  average,  have  greater  distinguishability. 

7.1.2  Real  Data 

In  this  subsection,  we  present  experimental  results  using  real  phantom  data  for 
plane  21.  Unless  noted,  the  data  are  from  14  minute  scans.  The  image  in  Figure 
7 13,  which  was  produced  by  averaging  fifteen  converged  MLEM  images,  depicts  the 
regions  that  were  used  in  the  contrast  and  distinguishability  calculations.  In  the  large 
and  small  tumor  ROIs  and  intermediate  region,  the  number  of  voxels  were  24,  12, 
and  16,  respectively. 

In  the  PML  and  QEP  algorithms,  we  used  the  log-cosh  penalty  function  A (t)  = 
log(cosh(|))  with  5 = 50.  Since  noise  in  real  data  is  stronger  than  the  synthetic 
data  due  to  errors  (e.g.,  scatter),  we  increased  the  value  of  the  parameter  5 to  reduce 
background  noise  (observe  that  we  used  5 = 20  for  the  synthetic  phantom  in  Figure 
7 1).  For  the  QEP  algorithm,  r](t)  = £ tanh(|)  was  used  with  £ = 500  and  1000  for  7 
minute  and  14  minute  real  data,  respectively.  We  varied  £ because  the  data  sets  lead 
to  reconstructed  images  that  have  different  edge  “heights”.  As  in  the  simulations 
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(a)  MLEM 


(b)  MLEM-S 


m m 


(c)  MLEM-F 


(d)  PML 


(e)QEP 


(f)  PWLS 


Figure  7 2:  Comparison  of  emission  images  when  the  synthetic  phantom  in  Figure 
7 1 was  used:  (a)  MLEM  image,  (b)  MLEM-S  image,  (c)  MLEM-F  image,  (d)  PML 
image,  (e)  QEP  image,  and  (f)  PWLS  image.  The  images  in  (a)  and  (b)  are  from 
500  and  20  iterations,  respectively,  while  the  images  in  (d),  (e),  and  (f)  are  from 
200  iterations.  The  image  in  (c)  was  obtained  from  filtering  the  MLEM  image  once 
with  a 5 x 5 Gaussian  filter  with  standard  deviation  of  1.27  in  voxels.  For  the  PML, 
QEP,  and  PWLS  images,  /?  was  chosen  in  such  a way  that  the  standard  deviation 
of  the  background  is  approximately  12.  Specifically,  (3  = 0.0415,  0.021,  and  0.006 
for  the  PML,  QEP,  and  PWLS  images,  respectively.  The  standard  deviation  of  the 
background  of  images  in  (b)  and  (c)  is  also  approximately  12.  For  display  purposes, 
all  the  images  were  adjusted  so  that  they  have  the  same  dynamic  range. 
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(a)  MLEM 


(b)  MLEM-S 


(c)  MLEM-F 


(d)  PML 


• • 


(e)  QEP 


(f)  PWLS 


* * 


Figure  7 3:  The  images  in  Figure  7 2 are  shown  with  their  own  dynamic  range. 
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(a) 


(b) 


Figure  7-4:  A line  plot  comparison  of  the  reconstructed  images  in  Figures  7-2  (a), 
(b),  and  (c)  is  shown  in  (a).  A line  plot  comparison  of  the  reconstructed  images  in 
Figures  7 2 (d),  (e),  and  (f)  is  shown  in  (b). 
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(c)  Large  Tumor 


(d)  Tumors 


Figure  7 5:  Plots  of  the  average  contrast  of  the  large  and  small  tumors  versus  the 
average  background  standard  deviation  are  shown  in  (a)  and  (b),  respectively,  when 
the  synthetic  phantom  in  Figure  7 1 was  used.  A plot  of  the  average  standard 
deviation  of  the  large  tumor  versus  the  average  background  standard  deviation  is 
shown  in  (c).  In  (d),  a plot  of  the  average  distinguishability  of  two  tumors  versus  the 
average  background  standard  deviation  is  shown.  Fifty  synthetic  data  realizations 
were  used  in  the  study.  For  the  MLEM-S  curves,  the  images  from  iterations  5 — 160 
were  used.  For  the  MLEM-F  curves,  the  MLEM  images  were  filtered  once  by  5 x 5 
Gaussian  filters  with  a standard  deviation  range  of  0.44  — 3.0  voxels  (each  voxel 
is  3.43  x 3.43  mm2).  For  the  PML,  QEP,  and  PWLS  algorithms,  the  images  were 
reconstructed  using  two  hundred  iterations  for  (3  = 2 1 — 2“9,  2~4— 2~9,  and  2"4— 2-12, 
respectively. 
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(a)  PML,Contrast=3 


(b)  QEP,Contrast=3 


(c)  PML, Contrasts. 5 


(d)  QEP, Contrasts. 5 


Figure  7 6:  Comparison  of  emission  images:  (a)  PML  image  and  (b)  QEP  image 
when  tumor  contrast  of  the  phantom  in  Figure  7 1 was  3,  and  (c)  PML  image  and 
(d)  QEP  image  when  tumor  contrast  of  the  phantom  in  Figure  7 1 was  1.5.  All  images 
are  from  200  iterations  and  they  have  the  same  background  standard  deviation  that 
is  approximately  10.  For  the  PML  and  QEP  images,  f3  = 1/16  and  1/32  were  used, 
respectively.  For  the  QEP  images  in  (b)  and  (d),  £ = 150  and  £ = 80  were  used, 
respectively.  For  display  purposes,  each  set  of  images  (i.e.,  (a)  and  (b),  (c)  and  (d)) 
were  adjusted  so  that  they  have  the  same  dynamic  range. 
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(a)  PML,Contrast=0.75 


(b)  QEP,Contrast=0.75 


(c)  PML, Contrasts. 5 


(d)  QEP,Contrast=0.5 


Figure  7 7:  Comparison  of  emission  images:  (a)  PML  image  and  (b)  QEP  image 
when  tumor  contrast  of  the  phantom  in  Figure  7 1 was  0.75,  and  (c)  PML  image 
and  (d)  QEP  image  when  tumor  contrast  of  the  phantom  in  Figure  7 1 was  0.5.  All 
images  are  from  200  iterations  and  they  have  the  same  background  standard  deviation 
that  is  approximately  10.  For  the  PML  and  QEP  images,  (5  = 1/16  and  1/32  were 
used,  respectively.  For  the  QEP  images  in  (b)  and  (d),  £ = 80  was  used.  For  display 
purposes,  each  set  of  images  (i.e.,  (a)  and  (b),  (c)  and  (d))  were  adjusted  so  that  they 
have  the  same  dynamic  range. 
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(a)  Contrast=3  (b)  Contrasts. 5 


(c)  Contrasts. 75 


(d)  Contrast=0.5 


Figure  7 8:  Plots  of  the  average  contrast  of  the  small  tumor  versus  the  average  back- 
ground standard  deviation  are  shown  in  (a),  (b),  (c),  and  (d)  when  the  synthetic 
phantom  in  Figure  7 1 was  used  with  tumor  contrast  of  3,  1.5,  0.75  and  0.5,  re- 
spectively. Fifty  synthetic  data  realizations  were  used  in  the  study.  For  the  PML 
and  QEP  algorithms,  the  images  were  reconstructed  using  two  hundred  iterations  for 
(3  = 2_1  — 2~9.  For  the  QEP  curves  in  (b),  (c),  and  (d),  £ = 80  was  used,  whereas 
£ = 150  was  used  for  the  QEP  curves  in  (a). 
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(a)  Distance=4,Contrast=3 


(b)  Distance=3,Contrast=3 
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(c)  Distance=2,Contrast=3 


(d)  Distance=2,Contrast=6 
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Figure  7 9:  Software  tumor  phantoms  are  shown  for  different  spacing  between  two 
tumors.  For  the  phantoms  in  (a),  (b),  (c),  and  (d),  spacing  between  tumors  are  2,  3, 
4,  and  2 in  voxels,  respectively.  Tumor  contrast  is  3,  3,  3,  and  6 for  the  phantoms 
in  (a),  (b),  (c),  and  (d),  respectively.  The  regions  surrounded  by  the  dotted  and 
dashed  lines  define  the  tumor  intermediate  region  (i.e.,  Mj ) and  background  region, 
respectively.  The  images  are  shown  with  their  own  dynamic  range. 
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(a)  PML,Distance=4,Contrast=3 


(b)  QEP,Distance=4,Contrast=3 


(c)  PML,Distance=3,Contrast=3 


(d)  QEP,Distance=3,Contrast=3 


Figure  7 10:  Comparison  of  emission  images:  (a)  PML  image  and  (b)  QEP  image 
when  the  phantom  in  Figure  7 9 (a)  was  used,  and  (c)  PML  image  and  (d)  QEP  image 
when  the  phantom  in  Figure  7 9 (b)  was  used.  All  images  are  from  200  iterations 
and  they  have  the  same  standard  deviation  of  the  background  that  is  approximately 
10.  For  the  PML  and  QEP  images,  (3  = 1/16  and  1/32  were  used,  respectively.  For 
display  purposes,  each  set  of  images  (i.e.,  (a)  and  (b),  (c)  and  (d))  were  adjusted  so 
that  they  have  the  same  dynamic  range. 
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(a)  PML,Distance=2,Contrast=3 


(b)  QEP,Distance=2,Contrast=3 


(c)  PML,Distance=2,Contrast=6 


(d)  QEP,Distance=2,Contrast=6 


Figure  7 11:  Comparison  of  emission  images:  (a)  PML  image  and  (b)  QEP  image 
when  the  phantom  in  Figure  7 9 (c)  was  used,  and  (c)  PML  image  and  (d)  QEP  image 
when  the  phantom  in  Figure  7 9 (d)  was  used.  All  images  are  from  200  iterations 
and  they  have  the  same  standard  deviation  of  the  background  that  is  approximately 
9.  For  the  PML  and  QEP  images,  (3  = 1/16  and  1/32  were  used,  respectively.  For 
display  purposes,  each  set  of  images  (i.e.,  (a)  and  (b),  (c)  and  (d))  were  adjusted  so 
that  they  have  the  same  dynamic  range. 
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(a)  Distance=4,  Contrast=3  (b)  Distance=3,  Contrast=3 


(c)  Distance=2,  Contrast=3 


(d)  Distance=2,  Conlrast=6 


Figure  7 12:  Plots  of  the  average  distinguishability  of  two  tumors  versus  the  average 
background  standard  deviation  are  shown  in  (a),  (b),  (c),  and  (d)  when  the  synthetic 
phantoms  in  Figures  7 9 (a),  (b),  (c),  and  (d)  were  used,  respectively.  Fifty  synthetic 
data  realizations  were  used  in  the  study.  For  the  PML  and  QEP  algorithms,  the 
images  were  reconstructed  using  two  hundred  iterations  for  /3  = 2-1  — 2“9. 
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with  the  synthetic  phantom,  the  quadratic  penalty  function  was  used  in  the  PWLS 
algorithm.  Note,  the  parameters  6 and  £ were  chosen  experimentally. 

Figure  7 14  is  a plot  of  the  images  for  plane  21  that  were  obtained  by  the  MLEM, 
MLEM-S,  MLEM-F,  PML,  QEP.  and  PWLS  algorithms.  The  number  of  iterations 
used  to  reconstruct  the  MLEM,  PML,  and  PWLS  images  was  500,  200,  and  200, 
respectively.  Figure  7 15  shows  that  the  PML  and  QEP  images  for  different  iteration 
numbers  when  a 14  minute  data  set  for  plane  21  was  used.  As  can  be  seen  in  Figure  7 
15,  early  iterates  of  the  PML  and  QEP  algorithms  can  be  used  because  100t/l  iterates 
are  not  much  different  from  500t/l  iterates.  As  in  the  simulated  data,  the  number  of 
iterations  for  the  QEP  algorithm  was  set  to  the  number  of  iterations  used  for  the 
PML  algorithm. 

For  the  PML,  QEP,  and  PWLS  images  in  Figure  7 14,  (3  was  chosen  so  that  the 
standard  deviation  of  their  backgrounds  are  approximately  same  (background  stan- 
dard deviation  is  approximately  68).  Using  9 iterations  and  repeating  the  application 
of  a 5 x 5 Gaussian  filter  two  times  yielded  the  MLEM-S  and  MLEM-F  images  with 
a background  standard  deviation  of  68.  Figures  7 14  (d)  and  (e)  illustrate  that  the 
tumors  in  the  PML  image  and  the  QEP  image  are  resolvable  and  differ  significantly 
from  the  background.  On  the  other  hand,  the  tumors  are  not  as  distinct  in  the  im- 
ages produced  by  the  MLEM-S,  MLEM-F,  and  PWLS  algorithms  (see  Figures  7 14 
(b),  (c),  and  (f)).  As  expected,  the  MLEM  image  in  Figure  7 14  (a)  is  considerably 
noisier  and  grainier  than  the  other  images.  In  Figure  7 16,  the  images  in  Figure  7 14 
are  plotted  with  their  own  dynamic  range. 

The  true  contrast  of  the  small  and  large  tumors  was  7.38.  Due  to  finite  reso- 
lution effects,  all  of  the  algorithms  underestimate  the  contrast  [68].  Consequently, 
the  algorithm  that  produces  the  greatest  contrast  is  to  be  preferred.  For  the  images 
in  Figure  7 14,  the  large  tumor  contrast  of  the  QEP  image  was  —14%,  37%,  32%, 
3%,  and  34%  higher  than  the  MLEM,  MLEM-S,  MLEM-F,  PML,  and  PWLS  images, 
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respectively.  The  increased  contrast  of  the  QEP  image  for  the  small  tumor  with  re- 
spect to  the  MLEM,  MLEM-S,  MLEM-F,  PML,  and  PWLS  images  was  —28%,  46%, 
46%,  9%,  and  43%,  respectively.  The  increased  tumor  distinguishability  of  the  QEP 
image  with  respect  to  the  MLEM,  MLEM-S,  MLEM-F,  PML,  and  PWLS  images 
was  —8%,  20%,  49%,  5%,  and  35%,  respectively.  Although  the  contrast  for  “true” 
MLEM  images  (i.e.,  images  obtained  when  the  MLEM  algorithm  is  not  terminated 
early)  can  be  quite  large,  the  noise  level  of  these  images  limits  their  practical  use. 

Figure  7 17  is  a line  plot  (the  row  is  shown  in  Figure  7 13)  of  the  images  in 
Figures  7 14  (b),  (d),  (e),  and  (f)  (the  image  in  Figure  7 14  (a)  is  too  noisy  and 
the  image  in  Figure  7 14  (c)  is  too  smooth).  For  the  row  under  consideration,  it  can 
be  seen  from  the  line  plots  that  the  PML  and  QEP  images  have  a higher  degree  of 
contrast  than  the  other  images.  And,  the  edges  in  the  QEP  image  are  sharper  than 
those  in  the  PML  image. 

Figures  7 18  (a)  and  (b)  are  plots  of  the  average  contrast  of  the  large  tumor 
and  small  tumor  versus  the  average  background  standard  deviation  using  fifteen  re- 
alizations for  plane  21,  respectively.  Further,  plots  of  the  average  standard  deviation 
of  the  large  tumor  and  the  average  distinguishability  of  two  tumors  versus  the  aver- 
age background  standard  deviation  for  the  fifteen  realizations  are  shown  in  Figures 
7 18  (c)  and  (d),  respectively.  Just  like  the  contrast  curves  in  the  synthetic  data 
simulations,  the  contrast  curves  of  the  QEP  algorithm  lie  above  the  corresponding 
curves  of  the  other  algorithms.  Thus,  for  a fixed  level  of  background  noise,  the  QEP 
images,  on  average,  have  the  greatest  contrast.  Not  too  surprising,  the  large  tumor 
average  standard  deviation  curves  of  the  QEP  algorithm  in  Figure  7 18  (c)  generally 
lie  above  the  corresponding  curves  of  the  PML  algorithm.  The  PWLS  algorithm  pro- 
duced large  tumor  standard  deviation  curves  that  lie  below  the  corresponding  curves 
generated  by  the  other  algorithms.  However,  the  degree  of  smoothing  produced  by 
the  PWLS  algorithm  was  too  great.  This  claim  is  supported  qualitatively  by  Figure 
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7 14  (f)  and  quantitatively  by  the  contrast  plots  in  Figures  7 18  (a)  and  (b).  The 
distinguishability  curves  of  the  PML  and  QEP  algorithms  lie  above  the  curves  of  the 
other  algorithms  when  the  background  standard  deviation  is  less  than  300.  Above 
that  point,  the  distinguishability  curves  of  the  PML  and  QEP  algorithms  lie  below 
the  curves  of  the  other  algorithms  except  the  PWLS  curve.  However,  images  with 
background  standard  deviation  greater  than  300  are  too  noisy  for  practical  use.  Thus, 
for  practical  noise  levels,  the  PML  and  QEP  images,  on  average,  have  the  greatest 
contrast  and  distinguishability. 

To  see  how  the  algorithms  would  perform  in  shorter  duration  protocols,  we  now 
consider  fifteen  realizations  of  7 minute  real  phantom  data  for  plane  21.  Figures  7 19 
(a)  and  (b)  are  plots  of  the  average  contrast  of  the  large  tumor  and  small  tumor 
versus  the  average  background  standard  deviation  using  fifteen  realizations,  respec- 
tively. A plot  of  the  average  standard  deviation  of  the  large  tumor  versus  the  average 
background  standard  deviation  for  the  fifteen  realizations  is  shown  in  Figure  7 19  (c). 
Also,  in  Figure  7 19  (d),  we  provide  a plot  of  the  average  distinguishability  of  two 
tumors  versus  the  average  background  standard  deviation  for  fifteen  realizations.  As 
in  the  experiments  with  the  14  minute  real  phantom  data,  it  is  evident  that  the  PML 
and  QEP  algorithms  outperform  the  MLEM-S,  MLEM-F,  and  PWLS  algorithms  in 
terms  of  contrast  recovery  and  tumor  distinguishability. 

Figure  7 20  is  a plot  of  the  images  obtained  by  the  MLEM,  MLEM-S,  MLEM-F, 
PML,  QEP,  and  PWLS  algorithms  for  a 7 minute  data  set  for  plane  21.  The  number 
of  iterations  used  to  reconstruct  the  MLEM,  PML,  QEP,  and  PWLS  images  was  500, 
200,  200,  and  200,  respectively.  For  the  PML,  QEP,  and  PWLS  images  in  Figure  7 20, 
(3  was  chosen  so  that  the  standard  deviation  of  their  backgrounds  are  approximately 
38.  Using  8 iterations  and  repeating  the  application  of  a 5 x 5 Gaussian  filter  two  times 
yielded  the  MLEM-S  and  MLEM-F  images  with  a background  standard  deviation  of 
38.  As  in  the  experiments  with  the  14  minute  data,  the  tumors  in  the  PML  image 
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Figure  7 13:  The  mean  of  15  MLEM  emission  images  reconstructed  from  14  minute 
data  for  plane  21.  The  boxes  indicate  the  regions  used  to  compute  the  contrast  and 
distinguishability  of  the  tumors.  The  solid  and  dashed  lines  define  the  tumors  and 
background  regions,  respectively.  The  region  surrounded  by  the  dotted  lines  defines 
the  tumor  intermediate  region  (i.e.,  Mj).  The  dotted  line  indicates  the  row  chosen 
for  the  line  plots. 

and  the  QEP  image  are  resolvable  and  differ  significantly  from  the  background.  In 
Figure  7 21,  the  images  in  Figure  7 20  are  plotted  with  their  own  dynamic  range. 

For  the  images  in  Figure  7 20,  the  large  tumor  contrast  of  the  QEP  image  was 
-20%,  37%,  25%,  5%,  and  25%  higher  than  the  MLEM,  MLEM-S,  MLEM-F,  PML, 
and  PWLS  images,  respectively.  The  increased  contrast  of  the  QEP  image  for  the 
small  tumor  with  respect  to  the  MLEM,  MLEM-S,  MLEM-F,  PML,  and  PWLS 
images  was  —58%,  34%,  27%,  5%,  and  23%,  respectively.  The  increased  tumor 
distinguishability  of  the  QEP  image  with  respect  to  the  MLEM,  MLEM-S,  MLEM- 
F,  PML,  and  PWLS  images  was  —30%,  12%,  42%,  6%,  and  25%,  respectively.  Figure 
7 22  is  a line  plot  (the  row  is  shown  in  Figure  7 13)  of  the  images  in  Figures  7 20 
(b),  (d),  (e),  and  (f).  As  in  the  experiments  with  the  14  minute  data,  it  can  be  seen 
from  the  line  plots  that  the  PML  and  QEP  images  have  a higher  degree  of  contrast 
than  the  other  images. 
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(a)  MLEM 


(b)  MLEM-S 


(c)  MLEM-F 


(d)  PML 


(e)  QEP 


(f)  PWLS 


Figure  7 14:  Comparison  of  emission  images  when  a 14  minute  real  phantom  data 
for  plane  21  was  used:  (a)  MLEM  image,  (b)  MLEM-S  image,  (c)  MLEM-F  image, 
(d)  PML  image,  (e)  QEP  image,  and  (f)  PWLS  image.  The  images  in  (a)  and  (b) 
are  from  500  and  9 iterations,  respectively,  while  the  images  in  (d),  (e),  and  (f)  are 
from  200  iterations.  The  image  in  (c)  was  obtained  from  filtering  the  MLEM  image 
two  times  with  a 5 x 5 Gaussian  filter  with  standard  deviation  of  1.95  in  voxels.  For 
the  PML,  QEP,  and  PWLS  images,  f3  was  chosen  in  such  a way  that  the  standard 
deviation  of  the  background  is  approximately  68.  Specifically,  (3  = 2“6,  2-7,  and 
0.00017  for  the  PML,  QEP,  and  PWLS  images,  respectively.  Note,  the  standard 
deviation  of  the  background  of  images  in  (b)  and  (c)  is  also  approximately  68.  For 
display  purposes,  all  the  images  were  adjusted  so  that  they  have  the  same  dynamic 
range  except  the  MLEM  image  because  it  has  very  wide  dynamic  range. 
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(a)  PML-100 


(b)  QEP-100 
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(c)  PML-200 


(d)  QEP-200 
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(e)  PML-500 


(f)  QEP-500 
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Figure  7 15:  Iteration  comparison  of  emission  images  reconstructed  from  using  a 14 
minute  real  phantom  data  for  plane  21:  (a)  PML  image  using  100  iterations,  (b) 
QEP  image  using  100  iterations,  (c)  PML  image  using  200  iterations,  (d)  QEP  image 
using  200  iterations,  (e)  PML  image  using  500  iterations,  and  (f)  QEP  image  using 
500  iterations.  For  the  PML  and  QEP  images,  /3  = 2~6  and  2~7,  respectively.  For 
display  purposes,  all  the  images  were  adjusted  so  that  they  have  the  same  dynamic 
range. 
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(a)  MLEM  (b)  MLEM-S 


(c)  MLEM-F 


(d)  PML 


(e)QEP 


(f)  PWLS 


Figure  7 16:  Emission  images  in  Figure  7 14  are  shown  with  their  own  dynamic 
range. 
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(a) 


(b) 


Figure  7 17:  A line  plot  comparison  of  the  reconstructed  images  in  Figures  7 14  (a), 
(b),  and  (c)  is  shown  in  (a).  A line  plot  comparison  of  the  reconstructed  images  in 
Figures  7 14  (d),  (e),  and  (f)  is  shown  in  (b). 
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(a)  Large  Tumor 


(b)  Small  Tumor 


(c)  Large  Tumor 


(d)  Tumors 


Figure  7 18:  Plots  of  the  average  contrast  of  the  large  and  small  tumors  versus  the 
average  background  standard  deviation  are  shown  in  (a)  and  (b),  respectively.  A plot 
of  the  average  standard  deviation  of  the  large  tumor  versus  the  average  background 
standard  deviation  is  shown  in  (c).  In  (d),  a plot  of  the  average  distinguishability 
of  two  tumors  versus  the  average  background  standard  deviation  is  shown.  Fifteen 
realizations  were  used  and  14  minute  real  phantom  data  for  plane  21  was  used  in  the 
study.  For  the  MLEM-S  curves,  the  images  from  iterations  5 — 160  were  used.  For 
the  MLEM-F  curves,  the  MLEM  images  were  filtered  once  by  5 x 5 Gaussian  filters 
with  a standard  deviation  range  of  0.44  — 3.0  voxels  (each  voxel  is  3.43  x 3.43  mm2). 
For  the  PML,  QEP,  and  PWLS  algorithms,  the  images  were  reconstructed  using  two 
hundred  iterations  for  (3  = 2“4  — 2“12,  2-4  — 2-12,  and  2-12  — 2~20,  respectively. 
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(a)  Large  Tumor  (b)  Small  Tumor 


(c)  Large  Tumor 


(d)  Tumors 


Figure  7 19:  Plots  of  the  average  contrast  of  the  large  and  small  tumors  versus  the 
average  background  standard  deviation  are  shown  in  (a)  and  (b),  respectively.  A plot 
of  the  average  standard  deviation  of  the  large  tumor  versus  the  average  background 
standard  deviation  is  shown  in  (c).  In  (d),  a plot  of  the  average  distinguishability 
of  two  tumors  versus  the  average  background  standard  deviation  is  shown.  Fifteen 
realizations  were  used  and  7 minute  real  phantom  data  for  plane  21  was  used  in  the 
study.  For  the  MLEM-S  curves,  the  images  from  iterations  5 — 160  were  used.  For 
the  MLEM-F  curves,  the  MLEM  images  were  filtered  once  by  5 x 5 Gaussian  filters 
with  a standard  deviation  range  of  0.44  — 3.0  voxels  (each  voxel  is  3.43  x 3.43  mm2). 
For  the  PML,  QEP,  and  PWLS  algorithms,  the  images  were  reconstructed  using  two 
hundred  iterations  for  (5  = 2~4  — 2“12,  2~4  — 2-12,  and  2-12  — 2~20,  respectively. 


109 


(a)  MLEM 


(b)  MLEM-S 


(c)  MLEM-F 


(d)  PML 


(e)QEP 


(f)  PWLS 


Figure  7 20:  Comparison  of  emission  images  when  a 7 minute  real  phantom  data 
for  plane  21  was  used:  (a)  MLEM  image,  (b)  MLEM-S  image,  (c)  MLEM-F  image, 
(d)  PML  image,  (e)  QEP  image,  and  (f)  PWLS  image.  The  images  in  (a)  and  (b) 
are  from  500  and  8 iterations,  respectively,  while  the  images  in  (d),  (e),  and  (f)  are 
from  200  iterations.  The  image  in  (c)  was  obtained  from  filtering  the  MLEM  image 
two  times  with  a 5 x 5 Gaussian  filter  with  standard  deviation  of  1.95  in  voxels.  For 
the  PML,  QEP,  and  PWLS  images,  (3  was  chosen  in  such  a way  that  the  standard 
deviation  of  the  background  is  approximately  38.  Specifically,  (3  = 0.029,  0.0145,  and 
0.000155  for  the  PML,  QEP,  and  PWLS  images,  respectively.  Note,  the  standard 
deviation  of  the  background  of  images  in  (b)  and  (c)  is  also  approximately  38.  For 
display  purposes,  all  the  images  were  adjusted  so  that  they  have  the  same  dynamic 
range  except  the  MLEM  image  because  it  has  very  wide  dynamic  range. 
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(e)QEP 


(f)  PWLS 


Figure  7 21:  Emission  images  in  Figure  7 20  are  shown  with  their  own  dynamic 
range. 
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(a) 


(b) 


Figure  7 22:  A line  plot  comparison  of  the  reconstructed  images  in  Figures  7 20  (a), 
(b),  and  (c)  is  shown  in  (a).  A line  plot  comparison  of  the  reconstructed  images  in 
Figures  7 20  (d),  (e),  and  (f)  is  shown  in  (b). 
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7.2  APML  algorithm 

To  evaluate  the  APML  algorithm  in  Chapter  4,  we  applied  it  to  plane  21  from 
the  real  thorax  phantom  data  (unless  noted,  the  data  are  from  14  minute  scans)  and 
compared  it  to  the  PML  algorithm  and  an  algorit  hm  by  Alin  and  Fessler  [35]  called  the 
block  sequential  regularized  expectation-maximization  (BSREM)-II  algorithm.  The 
BSREM-II  algorithm  is  a straightforward  modification  of  Ahn  and  Fessler’s  BSREM-I 
algorithm  [35].  The  BSREM-II  algorithm  results  by  setting  any  negative  element  of 
a BSREM-I  iterate  to  a small  positive  number.  The  BSREM-I  algorithm  is  based  on 
the  BSREM  algorithm  by  De  Pierro  and  Yamagishi  [33]  and  the  ordered-subsets  idea 
originally  put  forth  by  Hudson  and  Larkin  [45]. 

We  used  the  penalty  parameter  (3  = 0.02  and  0.04  for  14  minute  data  and  7 
minute  data,  respectively,  and  the  log-cosh  function  A (t)  = log(cosh(|))  with  5 = 50 
as  the  penalty  function.  The  values  of  (3  and  5 were  chosen  experimentally  such 
that  the  reconstructed  images  were  visually  “good”.  For  the  BSREM-II  algorithm, 
8 and  24  subsets,  and  the  ordering  rule  suggested  by  Ahn  and  Fessler  [35]  (i.e., 
make  projections  in  two  successive  subsets  as  perpendicular  as  possible  each  other) 
were  used.  The  ordering  rule  was  originally  introduced  by  Herman  and  Meyer  [69], 
Additionally,  the  relaxation  parameter  rule  specified  by  Ahn  and  Fessler  [35]  was  used 
in  the  implementation. 

In  Figure  7 23,  plots  of  the  cost  versus  CPU-time  are  shown  for  the  APML 
algorithm  for  different  values  of  e when  14  minute  data  were  used.  As  can  be  seen 
in  the  figure,  the  convergence  rate  of  the  APML  algorithm  depends  on  e.  Moreover, 
e = 0 generated  the  slowest  convergence  rate  for  the  considered  planes.  These  claims 
are  supported  by  Figure  7 24  in  which  plots  of  the  number  of  iterations  versus  e are 
shown  for  the  APML  algorithm  to  decrease  the  PML  objective  function  to  4>(x*), 
where  x*  is  the  5000^  iterate  of  the  PML  algorithm.  Practically  speaking,  for  the 
planes  considered,  x*  is  the  minimizer  of  the  PML  objective  function  because  the  PML 
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algorithm  did  not  decrease  the  PML  objective  function  after  about  1000  iterations 
until  5000  iterations.  For  0 < e < 0.01.  the  number  of  iterations  required  for  the 
APML  algorithm  to  decrease  the  PML  objective  function  to  <I>(a;*)  varied  from  83 
to  125  and  113  to  185  for  planes  10  and  21.  respectively.  For  e = 0,  the  number  of 
iterations  required  were  359  and  652  for  planes  10  and  21,  respectively.  For  0 < t < 
0.01,  the  decreased  convergence  time  (in  CPU  seconds)  of  the  APML  algorithm  with 
respect  to  the  PML  algorithm  was  71%  — 81%  and  75%  — 85%  for  planes  10  and  21, 
respectively.  For  e = 0,  the  decreased  convergence  time  of  the  APML  algorithm  with 
respect  to  the  PML  algorithm  was  17%  and  12%  for  planes  10  and  21,  respectively. 

In  Figure  7 25,  plots  of  the  cost  versus  CPU-time  are  shown  for  the  PML,  APML, 
and  BSREM-II  algorithms.  As  can  be  seen  in  the  figure,  the  APML  algorithm  for 
e = 0.01  decreased  the  PML  objective  function  more  than  about  three  times  faster 
than  the  PML  algorithm.  The  APML  algorithm  for  e = 0.01  needed  about  27% 
and  18%  of  the  CPU-time  that  was  necessary  for  the  PML  algorithm  to  decrease  the 
PML  objective  function  to  for  planes  10  and  21,  respectively.  At  the  early 

iterations  (CPU-time  less  than  about  10  seconds),  the  BSREM-II  algorithm  for  8 
subsets  produced  the  greatest  decrease.  However,  after  about  10  seconds  in  CPU- 
time, the  APML  algorithm  for  e = 0.01  decreased  the  PML  objective  function  faster 
than  the  other  algorithms.  It  should  be  pointed  out  that  the  BSREM-II  algorithm  did 
not  decrease  the  PML  objective  function  to  $(x*)  until  10,000  iterations,  whereas 
<f>(a:*)  was  obtained  by  the  APML  algorithm  for  e = 0.01  with  116  and  136  iterations 
for  planes  10  and  21,  respectively.  Moreover,  the  convergence  rate  of  the  BSREM-II 
algorithm  significantly  depends  on  the  number  of  subsets  as  shown  in  Figure  7 25. 

To  decrease  the  PML  objective  function  faster  at  the  early  iterations,  an  alter- 
native would  be  to  first  use  the  ordered  subsets  algorithm  for  a few  iterations  and 
then  switch  to  the  APML  algorithm.  To  do  this,  we  divide  the  emission  data  into  a 
few  subsets  (e.g.,  8 or  24)  according  to  the  ordering  rule  by  Herman  and  Meyer  [69]. 
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For  the  rth  subset,  we  define  a sub-objective  function  as  in  [35]: 

$r(*)  = {[P*]i  - d,  \og([Px  + p\i)  + pi  + log(dj!)}  + ^A(*)  , (7.3) 

teMr 

where  R is  the  number  of  subsets,  Mr  is  the  set  of  indices  corresponding  to  emis- 
sion data  within  the  rth  subset,  and  r = 1,2,...,/?.  Note  that  <3?  (a:)  — X]f=i  3v(*)- 
For  each  subset  and  the  corresponding  sub-objective  function,  a sub-iteration  is  per- 
formed sequentially.  At  the  (n,  r),h  sub-iteration,  a surrogate  function  for  the  rth 
sub-objective  function  is  constructed  using  (3.29),  (3.30),  (3.31),  and  (3.32)  at  the 
(■ n,r)th  sub-iterate  x^n,rK  The  next  sub-iterate  x(-n’r+1^  is  defined  to  be  the  nonnega- 
tive minimizer  of  the  surrogate  function.  After  one  pass  of  the  entire  sub-iterations, 
we  define  xll+l  = a;^n,fi+1)  and  £c(n+1,1)  = xn+1  for  the  next  pass  of  sub-iterations.  We 
refer  to  the  above  iteration  as  the  ordered  subset  PML  (OS-PML)  iteration. 

Figure  7 26  shows  plots  of  the  cost  versus  CPU-time  for  the  BSREM-II  algorithm 
for  8 subsets  and  the  APML  algorithm  with  a few  OS-PML  iterations  (2,  3,  and  4) 
for  8 and  24  subsets.  As  can  be  seen  in  the  figure,  the  APML  algorithm  with  a few 
OS-PML  iterations  decreases  the  PML  objective  function  faster  than  the  BSREM- 
II  algorithm  for  the  early  iterations.  Specifically,  the  APML  algorithm  with  the 
ordered-subsets  idea  needed  less  than  about  8 seconds  in  CPU-time  to  decrease  the 
PML  objective  function  to  the  cost  that  the  BSREM-II  algorithm  decreased  with  10 
seconds.  The  convergence  rate  depends  on  the  number  of  OS-PML  iterations  and  the 
number  of  subsets,  but  the  degree  of  dependency  is  not  much  great  as  shown  in  the 
figure. 

In  Figure  7 27,  iterates  for  plane  10  produced  by  the  BSREM-II  algorithms  for 
8 subsets  and  the  APML  algorithm  with  2 OS-PML  iterations  for  24  subsets  are 
shown  for  different  CPU-times.  As  shown  in  the  figure,  the  iterates  of  the  APML  and 
BSREM-II  algorithms  look  very  similar  each  other  even  though  their  associated  costs 
are  different.  Figure  7 28  shows  iterates  for  plane  21  produced  by  the  BSREM-II 
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algorithms  for  8 subsets  and  the  APML  algorithm  with  2 OS-PML  iterations  for  24 
subsets  are  shown  for  different  CPU-times. 

Now,  we  consider  7 minute  real  phantom  data.  In  Figure  7 29,  plots  of  the  cost 
versus  CPU-time  are  shown  for  the  PML.  APML.  and  BSREM-II  algorithms  when 
7 minute  data  were  used.  As  can  be  seen  in  the  figure,  the  convergence  rates  of  the 
algorithms  are  similar  to  the  ones  for  14  minute  data  except  the  BSREM-II  algorithm 
with  24  subsets  that  converged  much  slower  than  the  other  algorithms.  Again,  this 
indicates  that  the  convergence  rate  of  the  BSREM-II  algorithm  significantly  depends 
on  the  number  of  subsets.  Figure  7 30  shows  plots  of  the  cost  versus  CPU-time  for 
the  BSREM-II  algorithm  for  8 subsets  and  the  APML  algorithm  with  a few  OS-PML 
iterations  (2,  3,  and  4)  for  8 and  24  subsets  when  7 minute  data  were  used.  As  in 
the  experiments  with  the  14  minute  data,  the  APML  algorithm  with  a few  OS-PML 
iterations  decreases  the  PML  objective  function  faster  than  the  BSREM-II  algorithm 
with  8 subsets  for  the  early  iterations.  For  7 minute  data  case,  we  do  not  include  the 
corresponding  images  because  the  visual  comparisons  were  similar  to  those  observed 
in  the  images  for  14  minute  data. 

To  see  whether  convergence  rates  in  Figures  7 25,  7 26,  7 29,  and  7 30  are 
“typical”,  we  averaged  convergence  rate  for  fifteen  realization.  Since  cost  varies  for 
each  realization,  we  used  the  normalized-cost-difference  that  is  defined  as 

- <fr(x<100>) 

- $(x<10°>)  ’ ' ' 

where  x(0)  and  x('W0)  are  the  uniform  initial  estimate  and  the  100th  APML  iterate, 
respectively.  The  reason  why  we  used  cc(100)  is  that  the  APML  algorithm  “almost” 
converges  with  100  iterations,  which  means  that  the  PML  objective  function  does 
not  decrease  appreciably  after  100  APML  iterations.  In  Figures  7 31  and  7 32,  plots 
of  the  average  normalized-cost-difference  (averaged  normalized-cost-difference  over 
fifteen  realizations)  versus  CPU-time  are  shown  for  the  PML,  APML,  and  BSREM-II 
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algorithms  when  fifteen  7 and  14  minute  data  were  used,  respectively.  Figures  7 33 
and  7 34  show  plots  of  the  average  normalized-cost-difference  versus  CPU-time  for 
the  BSREM-II  algorithm  for  8 subsets  and  the  APML  algorithm  with  a few  OS-PML 
iterations  (2,  3,  and  4)  for  8 and  24  subsets  when  fifteen  7 and  14  minute  data  were 
used,  respectively.  The  figures  confirm  that  the  observed  convergence  rates  in  Figures 
7 25,  7 26,  7 29,  and  7 30  are  “typical” . 

Table  7 1 contains  the  CPU-times,  memory  accesses,  floating  point  operations, 
and  the  number  of  function  calls  (\/f  and  A (t))  per  iteration  for  the  algorithms  used 
in  the  experiments.  From  the  table,  it  is  shown  that  the  APML  algorithm  greatly 
reduced  the  number  of  iterations  for  convergence  compared  with  the  PML  algorithm. 
Convergence  in  the  table  means  that  an  algorithm  decreases  the  PML  objective  func- 
tion until  it  equals  the  5000fh  iterate  of  the  PML  algorithm,  denoted  by  x*.  Although 
the  BSREM-II  algorithm  did  not  converge  until  10,000  iterations,  the  algorithm  “al- 
most” converged  at  the  point  because  |4>(x^10000')  — 4>(cc*)|  ~ 1,  where  a*10000  is  the 
10000<fe  BSREM-II  iterate  (note:  |<F(*(1000))  - T(x(10000))|  « 1). 

7.3  PCiPS  Algorithm 

To  evaluate  the  probability  matrix  estimation  method  in  Chapter  6,  we  applied 
the  probability  correction  in  projection  space  (PCiPS)  algorithm  to  plane  21  from 
the  real  thorax  phantom  data  (unless  noted,  the  data  are  from  14  minute  scans)  and 
compared  it  to  the  APML  algorithm  with  a geometric  probability  matrix  Vc.  The 
probability  matrix  Vc  was  computed  using  the  angle-of-view  method  [16]  with  correc- 
tions for  errors  due  to  attenuation  and  detector  inefficiency.  As  in  Sections  7.1  and 
7.2,  to  get  the  attenuation  correction  factors,  post-injected  transmission  scan  data 
was  collected  for  three  minutes  and  the  attenuation  correction  method  by  Anderson 
et  al.  [9]  was  employed.  A normalization  file  was  used  to  correct  for  detector  ineffi- 
ciency. Finally,  the  randoms  were  used  as  noise  free  estimates  of  the  mean  numbers 
of  accidental  coincidences.  The  PCiPS  algorithm  used  one  iteration  to  estimate  the 
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(a)  Convergence  Rate  (plane  10) 


x 1 05  (b)  Convergence  Rate  (plane  21 ) 


Figure  7 23:  Convergence  rate  comparison  of  the  APML  algorithm  for  different  values 
of  t when  14  minute  data  were  used. 
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(a)  Number  of  Iterations  to  Converge  (plane  10) 
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(b)  Number  of  Iterations  to  Converge  (plane  21 ) 
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Figure  7 24:  Number  of  iterations  for  the  APML  algorithm  to  decrease  the  PML 
objective  function  to  4>(aP)  are  shown  for  different  values  of  e when  14  minute  data 
were  used,  where  x*  is  the  5000th  iterate  of  the  PML  algorithm. 
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(a)  Convergence  Rate  (plane  10) 


(b)  Convergence  Rate  (plane  21) 


Figure  7 25:  Convergence  rate  comparison  of  the  PML,  APML  and  the  BSREM-II 
algorithms  when  14  minute  data  were  used. 
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(a)  Convergence  Rate  (plane  10) 


(b)  Convergence  Rate  (plane  21) 


Figure  7 26:  Convergence  rate  comparison  of  the  APML  algorithm  with  different 
number  of  OS-PML  iterations  for  8 and  24  subsets,  and  the  BSREM-II  algorithm  for 
8 subsets  when  14  minute  data  were  used.  For  the  APML  algorithm  e = 0.01  was 
used. 
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(a)  APML,  5 sec 


(b)  BSREM-tl,  5 sec 


(c)  APML,  15  sec 


(d)  BSREM-II,  15  sec 


(e)  APML,  25  sec 


(f)  BSREM-II,  25  sec 


(g)  APML,  35  sec 


(h)  BSREM-II,  35  sec 


Figure  7 27:  APML  and  BSREM-II  iterates  for  plane  10  are  shown  for  different 
CPU-times  when  14  minute  data  was  used.  The  APML  iterates  were  generated  by 
the  APML  algorithm  with  e = 0.01  and  2 OS-PML  iterations  for  24  subsets  The 
BSREM-II  iterates  were  generated  by  the  BSREM-II  algorithm  for  8 subsets.  For 
display  purposes,  all  the  images  were  adjusted  so  that  they  have  the  same  dynamic 
range. 
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(a)  APML,  5 sec 


(b)  BSREM-II,  5 sec 


<1  « 


(c)  APML,  15  sec 


(d)  BSREM-II,  15  sec 


% •* 


(e)  APML,  25  sec 


(f)  BSREM-II,  25  sec 


(g)  APML,  35  sec 


(h)  BSREM-II,  35  sec 


Figure  7 28:  APML  and  BSREM-II  iterates  for  plane  21  are  shown  for  different 
CPU-times  when  14  minute  data  was  used.  The  APML  iterates  were  generated  by 
the  APML  algorithm  with  e = 0.01  and  2 OS-PML  iterations  for  24  subsets  The 
BSREM-II  iterates  were  generated  by  the  BSREM-II  algorithm  for  8 subsets.  For 
display  purposes,  all  the  images  were  adjusted  so  that  they  have  the  same  dynamic 
range. 
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(a)  Convergence  Rate  (plane  10) 


(b)  Convergence  Rate  (plane  21) 


Figure  7 29:  Convergence  rate  comparison  of  the  PML,  APML  and  the  BSREM-II 
algorithms  when  7 minute  data  were  used. 
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(a)  Convergence  Rate  (plane  10) 


(b)  Convergence  Rate  (plane  21 ) 


Figure  7 30:  Convergence  rate  comparison  of  the  APML  algorithm  with  different 
number  of  OS-PML  iterations  for  8 and  24  subsets,  and  the  BSREM-II  algorithm  for 
8 subsets  when  7 minute  data  were  used.  For  the  APML  algorithm  e = 0.01  was 
used. 
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(a)  Convergence  Rate  (plane  10) 


(b)  Convergence  Rate  (plane  21) 


Figure  7-31:  Average  convergence  rate  comparison  of  the  PML,  APML  and  the 
BSREM-II  algorithms  when  7 minute  data  was  used. 
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(a)  Convergence  Rate  (plane  10) 


(b)  Convergence  Rate  (plane  21) 


Figure  7 32:  Average  convergence  rate  comparison  of  the  PML,  APML  and  the 
BSREM-II  algorithms  when  14  minute  data  was  used. 
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(a)  Convergence  Rate  (plane  10) 


(b)  Convergence  Rate  (plane  21) 


Figure  7 33:  Average  convergence  rate  comparison  of  the  APML  algorithm  with 
different  number  of  OS-PML  iterations  for  8 and  24  subsets,  and  the  BSREM-II 
algorithm  for  8 subsets  when  7 minute  data  was  used.  For  the  APML  algorithm 
e = 0.01  was  used. 
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(a)  Convergence  Rate  (plane  10) 


(b)  Convergence  Rate  (plane  21) 


Figure  7 34:  Average  convergence  rate  comparison  of  the  APML  algorithm  with 
different  number  of  OS-PML  iterations  for  8 and  24  subsets,  and  the  BSREM-II 
algorithm  for  8 subsets  when  14  minute  data  was  used.  For  the  APML  algorithm 
e = 0.01  was  used. 
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Table  7 1:  Comparison  of  the  computational  complexity  per  iteration. 


PML 

BSREM-II 

APML 

CPU-time  (sec) 

0.271565 

0.58836  {R  = 8) 

0.49828 

Memory  Read 

2 P 

2 P 

3 P 

Mul/Div 

2 P + I + 24  J 

2P  + I + 12RJ 

4P  + 8/  + 58  J 

Add/Sub 

2P  + 31J 

2P  + / + 16RJ 

4P  + 3/  + 66  J 

Vt 

J 

- 

J 

m 

8 J 

8 RJ 

16  J 

Nc  for  plane  10 

791 

N/A 

359  (e  = 0),  116  (e  = 0.01) 

Nc  for  plane  21 

1362 

N/A 

652  (e  = 0),  136  (e  = 0.01) 

The  letters  P and  R denote  the  number  of  non-zero  elements  in  the  probability  matrix 
V and  the  number  of  subsets  for  the  BSREM-II  algorithm,  respectively.  The  letters 
I and  J represent  the  number  of  data  points  and  voxels,  respectively.  Nc  denotes  the 
number  of  iterations  for  convergence.  All  the  algorithms  were  computed  on  a Dell 
Inspiron-5150  computer.  Convergence  in  the  table  means  that  an  algorithm  decreases 
the  PML  objective  function  until  it  equals  4>(:e*),  where  x*  is  the  5000(/l  PML  iterate. 

probability  matrix.  In  other  words,  given  the  MLEM  image  based  on  Vc,  the  PCiPS 
algorithm  first  estimated  the  probability  matrix,  referred  to  as  'Ptrue,  and  then  im- 
ages were  reconstructed  by  the  APML  algorithm  with  100  iterations.  In  the  PCiPS 
algorithm,  two  choices  (15  and  63)  for  the  parameter  r were  examined  in  the  experi- 
ments. We  used  the  log-cosh  function  A (t)  = log(cosh(|))  with  5 = 50  as  the  penalty 
function. 

The  images  in  Figures  7 35  (a)  and  (b)  were  obtained  by  the  MLEM-S  and 
APML  algorithms  with  Vc.  The  images  in  Figures  7 35  (c)  and  (d)  were  obtained 
by  the  APML  algorithm  using  Ptrue  with  r = 15  and  r = 63,  respectively.  For 
the  APML  images  in  Figure  7 35,  (3  was  chosen  so  that  the  background  standard 
deviation  is  approximately  same  (background  standard  deviation  is  approximately 
48).  Using  5 iterations,  the  MLEM-S  image  with  a background  standard  deviation  of 
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48  was  obtained.  Figure  7 35  (a)  illustrates  that  the  tumors  in  the  MLEM-S  image  is 
too  smooth.  Figures  7 35  (b),  (c),  and  (d)  show  that  the  APML  images  visually  look 
similar  to  each  other.  However,  they  are  quite  different  especially  around  tumors. 
This  claim  is  supported  by  Figure  7 36  that  is  a line  plot  (the  row  is  shown  in  Figure 
7 13)  of  the  images  in  Figure  7 35.  For  the  row  under  consideration,  it  can  be  seen 
from  the  line  plots  that  the  APML  images  in  Figures  7 35  (c)  and  (d)  have  a higher 
degree  of  contrast  than  the  other  images.  For  the  images  in  Figure  7 35,  the  large 
tumor  contrast  of  the  APML  image  in  (d)  was  59%,  17%,  and  7%  higher  than  the 
images  in  (a),  (b),  and  (c),  respectively.  The  increased  contrast  of  the  APML  image 
in  (d)  for  the  small  tumor  with  respect  to  the  images  in  (a),  (b),  and  (c)  was  60%, 
20%,  and  14%,  respectively.  The  increased  tumor  distinguishability  of  the  APML 
image  in  (d)  with  respect  to  the  images  in  (a),  (b),  and  (c)  was  30%,  10%,  and  5%, 
respectively. 

Figures  7 37  (a)  and  (b)  are  plots  of  the  average  contrast  of  the  large  tumor  and 
small  tumor  versus  the  average  background  standard  deviation  using  fifteen  realiza- 
tions for  plane  21,  respectively.  Further,  plots  of  the  average  standard  deviation  of 
the  large  tumor  and  the  average  distinguishability  of  two  tumors  versus  the  average 
background  standard  deviation  for  the  fifteen  realizations  are  shown  in  Figures  7 37 
(c)  and  (d),  respectively.  The  contrast  and  distinguishability  curves  of  the  APML 
algorithm  with  the  estimated  probability  matrices  lie  above  the  corresponding  curves 
of  the  other  algorithms.  Thus,  for  a fixed  level  of  background  noise,  the  APML  im- 
ages generated  using  the  estimated  probability  matrices,  on  average,  have  the  greatest 
contrast  and  distinguishability. 

Now,  we  consider  fifteen  realizations  of  7 minute  real  phantom  data  for  plane  21. 
Figures  7 38  (a)  and  (b)  are  plots  of  the  average  contrast  of  the  large  tumor  and  small 
tumor  versus  the  average  background  standard  deviation  using  fifteen  realizations, 
respectively.  A plot  of  the  average  standard  deviation  of  the  large  tumor  versus  the 
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average  background  standard  deviation  for  the  fifteen  realizations  is  shown  in  Figure 
7 38  (c).  Also,  in  Figure  7 38  (d),  we  provide  a plot  of  the  average  distinguishability 
of  two  tumors  versus  the  average  background  standard  deviation  for  fifteen  realiza- 
tions. As  in  the  experiments  with  the  14  minute  real  phantom  data,  it  is  evident 
that  the  APML  algorithm  with  the  estimated  probability  matrices  outperform  the 
MLEM-S  and  APML  algorithms  with  Vc  in  terms  of  contrast  recovery  and  tumor 
distinguishability. 

Figures  7 39  (a)  and  (b)  are  the  images  obtained  by  the  MLEM-S  and  APML 
algorithms  using  Vc , respectively,  for  a 7 minute  data.  Figures  7 39  (c)  and  (d)  are 
the  images  obtained  by  the  APML  algorithm  using  'Pt"lc  with  r = 15  and  r = 63, 
respectively,  for  a 7 minute  data.  Figure  7 39  (a)  is  the  7th  MLEM  iterate.  For 
the  APML  algorithm,  100  iterations  were  used.  For  the  images  in  Figures  7 39  (b), 
(c),  and  (d),  (3  was  chosen  so  that  the  standard  deviation  of  their  backgrounds  are 
approximately  36.  As  in  the  experiments  with  the  14  minute  data,  the  images  in 
Figures  7 39  (b),  (c),  and  (d)  look  similar  to  each  other.  Figure  7 40  is  the  line  plot 
of  the  images  in  Figure  7 39.  Also,  as  in  the  experiments  with  the  14  minute  data, 
for  the  row  under  consideration,  the  APML  images  in  Figures  7 39  (c)  and  (d)  have 
a higher  degree  of  contrast  than  the  other  images.  For  the  images  in  Figure  7 39,  the 
large  tumor  contrast  of  the  APML  image  in  (d)  was  50%,  19%,  and  6%  higher  than 
the  images  in  (a),  (b),  and  (c),  respectively.  The  increased  contrast  of  the  APML 
image  in  (d)  for  the  small  tumor  with  respect  to  the  images  in  (a),  (b),  and  (c)  was 
47%,  19%,  and  7%,  respectively.  The  increased  tumor  distinguishability  of  the  APML 
image  in  (d)  with  respect  to  the  images  in  (a),  (b),  and  (c)  was  18%,  10%,  and  — 1%, 
respectively. 
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(a)  MLEM-S 


(b)APML 
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(c)  PCiPS,T=15 


(d)  PCiPS,t=63 
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Figure  7 35:  Comparison  of  emission  images  when  a 14  minute  real  phantom  data 
for  plane  21  was  used:  (a)  MLEM-S  image  with  Vc,  (b)  APML  image  with  Vc , (c) 
APML  image  with  'Ptrue  and  r = 15,  (d)  APML  image  with  'Ptrue  and  r = 63.  The 
image  in  (a)  is  from  5 iterations,  while  the  images  in  (b),  (c),  and  (d)  are  from  100 
iterations.  For  the  images  in  (b),  (c),  and  (d),  (3  was  chosen  in  such  a way  that  the 
standard  deviation  of  the  background  is  approximately  48.  Specifically,  (3  — 1/32, 
1/32,  and  0.028  for  the  images  in  (b),  (c),  and  (d),  respectively.  For  display  purposes, 
all  the  images  were  adjusted  so  that  they  have  the  same  dynamic  range. 
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Line  plot 


Figure  7 36:  A line  plot  comparison  of  the  reconstructed  images  in  Figure  7 35. 
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(a)  Large  Tumor 


(b)  Small  Tumor 


(c)  Large  Tumor 


Figure  7 37:  Plots  of  the  average  contrast  of  the  large  and  small  tumors  versus  the 
average  background  standard  deviation  are  shown  in  (a)  and  (b),  respectively.  A plot 
of  the  average  standard  deviation  of  the  large  tumor  versus  the  average  background 
standard  deviation  is  shown  in  (c).  In  (d),  a plot  of  the  average  distinguishability 
of  two  tumors  versus  the  average  background  standard  deviation  is  shown.  Fifteen 
realizations  were  used  and  14  minute  real  phantom  data  for  plane  21  was  used  in 
the  study.  For  the  MLEM-S  curves,  the  images  from  iterations  5 — 160  were  used. 
For  the  other  curves,  the  images  were  reconstructed  using  two  hundred  iterations  for 
(3  — 2~4  — 2~12. 
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(a)  Large  Tumor 


(b)  Small  Tumor 


Figure  7 38:  Plots  of  the  average  contrast  of  the  large  and  small  tumors  versus  the 
average  background  standard  deviation  are  shown  in  (a)  and  (b),  respectively.  A plot 
of  the  average  standard  deviation  of  the  large  tumor  versus  the  average  background 
standard  deviation  is  shown  in  (c).  In  (d),  a plot  of  the  average  distinguishability 
of  two  tumors  versus  the  average  background  standard  deviation  is  shown.  Fifteen 
realizations  were  used  and  7 minute  real  phantom  data  for  plane  21  was  used  in 
the  study.  For  the  MLEM-S  curves,  the  images  from  iterations  5 — 160  were  used. 
For  the  other  curves,  the  images  were  reconstructed  using  two  hundred  iterations  for 
(3  = 2-4  — 2-12. 
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(a)  MLEM-S 


(b)APML 


(c)  PCiPS,x=15 


(d)  PCiPS,x=63 
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Figure  7 39:  Comparison  of  emission  images  when  a 7 minute  real  phantom  data 
for  plane  21  was  used:  (a)  MLEM-S  image  with  Vc,  (b)  APML  image  with  Vc , (c) 
APML  image  with  ,ptrue  and  r = 15,  (d)  APML  image  with  f>tTue  and  r = 63.  The 
image  in  (a)  is  from  7 iterations,  while  the  images  in  (b),  (c),  and  (d)  are  from  100 
iterations.  For  the  images  in  (b),  (c),  and  (d),  (5  was  chosen  in  such  a way  that  the 
standard  deviation  of  the  background  is  approximately  36.  Specifically,  (3  = 1/32, 
1/32,  and  0.028  for  the  images  in  (b),  (c),  and  (d),  respectively.  For  display  purposes, 
all  the  images  were  adjusted  so  that  they  have  the  same  dynamic  range. 
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Line  plot 


Figure  7 40:  A line  plot  comparison  of  the  reconstructed  images  in  Figure  7 39. 


CHAPTER  8 

CONCLUSIONS  AND  FUTURE  WORK 

8.1  Conclusions 

The  PML  algorithm  we  developed  for  reconstructing  emission  images  generates 
nonnegative  emission  mean  estimates  and  monotonically  decreases  the  PML  objec- 
tive function.  The  algorithm  is  straightforward  to  implement  and  can  incorporate 
any  penalty  function  that  satisfies  the  mild  assumptions  (AS3)-(AS8).  Under  cer- 
tain conditions  (i.e. , (AS1)-(AS8)),  the  PML  objective  function  is  strictly  convex. 
And,  for  the  case  where  the  PML  objective  function  is  strictly  convex,  it  has  been 
proven  that  the  PML  algorithm  converges  to  the  minimizer  of  the  PML  objective 
function.  Although  the  tradeoff  between  resolution  and  noise  can  be  controlled  by 
certain  regularization  hyperparameters  (i.e.,  /3  and  5),  like  many  researchers,  we  have 
not  determined  a way  to  choose  the  parameters  so  that  the  data-fit  and  a prior 
knowledge  are  optimally  “balanced”. 

A fast  version  of  the  PML  algorithm,  called  the  APML  algorithm,  was  developed 
that  retains  the  properties  of  the  PML  algorithm.  The  APML  algorithm  is  based 
on  the  PML  algorithm  and  pattern  search  idea  of  Hooke  and  Jeeve.  However,  we 
modified  the  direction  vector  to  account  for  the  PET  image  reconstruction  problem. 
The  APML  algorithm  generates  nonnegative  emission  mean  estimates,  monotonically 
decreases  the  PML  objective  function,  and  can  accommodate  the  same  class  of  penalty 
functions  as  the  PML  algorithm.  Importantly,  it  has  been  proven  that  the  APML 
algorithm  converges  to  the  minimizer  of  the  PML  objective  function  when  the  PML 
objective  function  is  strictly  convex.  Although  the  APML  algorithm  requires  an 
additional  parameter  e that  is  used  to  define  the  direction  vector,  experiments  using 
real  phantom  data  demonstrated  that  fast  convergence  rates  were  obtained  over  a 
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wide  range  of  values  for  e (e.g.,  0 < e < 0.01).  This  means  that  the  APML  algorithm 
is  robust  with  respect  to  the  parameter  e. 

In  experiments  using  real  phantom  data,  it  was  shown  that  the  APML  algorithm 
decreased  the  PML  objective  function  about  three  times  faster  than  the  PML  algo- 
rithm. Specifically,  the  APML  algorithm  for  0 < e < 0.01  needed  about  one  third 
of  the  CPU-time  that  was  necessary  for  the  PML  algorithm  to  decrease  the  PML 
objective  function  to  a “practical  minimizer”  of  the  objective  function.  By  practi- 
cal minimizer,  we  mean  a PML  iterate  that  has  a PML  objective  value  (i.e. , cost) 
that,  practically  speaking,  cannot  be  appreciably  decreased  with  increasing  iterations. 
Specifically,  the  minimum  PML  objective  function  value  resulted  from  the  5000  it- 
eration of  the  PML  algorithm.  We  compared  the  convergence  rate  of  the  APML 
algorithm  to  an  ordered-subsets  method  in  [35]  because  the  ordered-subsets  based 
algorithm  is  said  to  converge  to  the  nonnegative  minimizer  of  the  PML  objective 
function.  At  the  early  iterations,  the  ordered-subsets  method  decreased  the  PML  ob- 
jective function  at  a faster  rate  than  the  APML  algorithm.  However,  after  about  10 
seconds  in  CPU-time,  the  APML  algorithm  decreased  faster.  It  was  also  shown  that 
when  the  APML  algorithm  utilizes  an  ordered-subsets  based  PML  iteration  for  a few 
early  iterations,  the  resulting  algorithm  decreased  the  PML  objective  function  more 
than  about  1.2  times  faster  than  the  ordered-subsets  method  for  the  early  iterations 
(i.e.,  less  than  10  seconds  in  CPU-time). 

In  addition  to  the  PML  and  APML  algorithms,  we  also  proposed  a regularized 
image  reconstruction  algorithm  we  call  the  QEP  algorithm.  The  QEP  algorithm 
obtains  regularized  estimates  of  the  emission  means  through  the  use  of  an  iteration 
dependent  penalty  function  that  serves  to  preserve  edges  in  the  reconstructed  images. 
The  definition  of  the  penalty  function  was  motivated  by  an  analysis  of  the  surrogate 
function  for  a penalty  function  that  is  utilized  by  the  PML  algorithm.  In  the  QEP 
algorithm,  at  each  iteration,  the  iteration  dependent  penalty  function  is  found  and 
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the  next  iterate  is  originally  defined  to  be  a nonnegative  minimizer  of  sum  of  the 
negative  log  likelihood  function  and  the  penalty  function.  However,  since  it  is  not 
possible  to  solve  the  constrained  optimization  problem,  the  QEP  algorithm  alterna- 
tively defines  the  next  iterate  to  be  a nonnegative  minimizer  of  sum  of  the  de-coupled 
surrogate  function  for  the  negative  log  likelihood  function  and  the  iteration  dependent 
penalty  function.  It  is  important  to  understand  that  the  QEP  algorithm  defines  a 
new  objective  function  to  be  minimized  at  each  iteration.  Thus,  unlike  the  PML  and 
APML  algorithms,  the  QEP  algorithm  does  not  minimize  a single  objective  function. 
This  means  that  the  usual  mathematical  tools  for  investigating  convergence  are  un- 
available. Despite  its  theoretical  drawbacks,  the  QEP  algorithm  performed  extremely 
well  in  experiments  with  computer-generated  phantom  data  and  real  thorax  phantom 
data,  and  outperformed  the  PML  (or  APML)  algorithm  and  PWLS  algorithm  [42]  in 
terms  of  contrast  recovery. 

In  experiments,  the  images  produced  by  the  PML  (or  APML)  and  QEP  algo- 
rithms had  greater  contrast  and  “distinguishability”  than  the  MLEM-S  and  PWLS 
images  for  a fixed  level  of  background  noise.  The  MLEM-S  images  were  produced  by 
early  termination  of  the  MLEM  algorithm  [16]  and  the  PWLS  images  were  produced 
by  the  PWLS  algorithm  [42].  Specifically,  for  a 14  minute  real  phantom  data  set,  the 
contrast  of  the  large  tumor  in  the  QEP  image  was  37%,  3%,  and  34%  higher  than 
the  MLEM-S, PML,  and  PWLS  images,  respectively.  With  respect  to  the  MLEM-S, 
PML,  and  PWLS  images,  the  contrast  of  the  small  tumor  in  the  QEP  image  was 
46%,  9%,  and  43%  higher,  respectively.  The  increased  tumor  distinguishability  of  the 
QEP  image  with  respect  to  the  MLEM-S,  PML,  and  PWLS  images  was  20%,  5%, 
and  35%,  respectively.  Moreover,  qualitatively  speaking,  the  spatial  extent  of  the 
tumors  were  more  easily  resolved  with  the  PML  and  QEP  images.  Since  the  QEP 
algorithm  yielded  the  greatest  contrast  for  a fixed  level  of  background  noise,  it  may 
be  particularly  well  suited  for  tumor  detection  applications. 
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Errors  caused  by  scatter,  non-collinearity,  detector  penetration,  and  positron 
range  have  the  net  effect  of  introducing  blur  into  PET  images.  In  theory,  these  errors 
can  be  corrected  by  determining  a suitable  probability  matrix.  However,  it  is  difficult 
to  determine  such  a probability  matrix  because  the  required  modelling  is  impractical. 
In  Chapter  6,  we  first  assume  that  the  “true”  probability  matrix  for  the  observed 
emission  data  is  a product  of  an  unknown  nonnegative  matrix,  called  a scatter  ma- 
trix, and  a “conventional”  probability  matrix.  The  conventional  probability  matrix 
is  generated  from  the  geometry  of  the  PET  scanner  and  image  space  to  be  recon- 
structed, along  with  certain  standard  corrections  for  errors.  We  developed  a method, 
referred  to  as  the  joint  minimum  Kullback-Leibler  (KL)  distance  method,  that  aims 
to  reduce  blur  in  PET  images.  In  the  joint  minimum  KL  distance  method,  the  scatter 
matrix  and  emission  means  are  jointly  estimated  by  minimizing  the  distance  the  data 
and  model  parameters.  Because  of  the  difficulty  of  the  minimization  problem,  the 
number  of  unknowns  in  the  scatter  matrix  is  reduced  and  an  alternating  minimization 
algorithm  is  developed.  Thus,  given  an  estimate  for  the  scatter  matrix,  an  estimate 
for  the  emission  means  is  obtained.  The  estimate  for  the  emission  means  can  then  be 
used  to  generate  an  improved  estimate  for  the  scatter  matrix.  Alternating  between 
these  two  steps  leads  to  the  desired  estimates  for  the  scatter  matrix  and  emission 
means.  Once  the  estimate  of  the  scatter  matrix  is  obtained,  the  estimate  for  the  true 
probability  matrix  is  the  a product  of  the  estimated  scatter  matrix  and  conventional 
probability  matrix.  Then,  a regularized  image  reconstruction  algorithm  is  applied 
to  the  emission  data  using  the  estimated  true  probability  matrix.  In  experiments 
with  real  phantom  data,  the  APML  algorithm  is  employed  because  of  its  fast  con- 
vergence. The  contrast  of  the  reconstructed  images  generated  using  the  estimated 
probability  matrix  was  more  accurate  than  the  reconstructed  images  generated  using 
the  conventional  probability  matrix. 
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8.2  Future  Work 

Alt  hough  the  APML  algorithm  decreases  the  PML  objective  function  faster  than 
the  PML  algorithm  in  experiments,  it  is  not  clear  that  how  fast  the  APML  algorithm 
converges  in  theory.  Consequently,  it  would  be  worthwhile  to  determine  the  theoret- 
ical rate  of  convergence  of  the  APML  algorithm.  A related  research  direction  would 
be  to  determine  the  parameter  e that  maximizes  convergence  rate  of  the  APML  al- 
gorithm. Keeping  in  mind  Hook  and  Jeeve’s  idea,  it  is  possible  that  the  convergence 
rate  of  the  APML  algorithm  could  be  improved  by  using  a “better"  direction  vector. 
We  say  this  because,  in  experiments  where  the  “best”  direction  vector  xn  1 1 — x* 
was  used  ( x * is  the  minimizer  of  the  PML  objective  function)  the  APML  algorithm 
converged  in  only  a few  iterations  (e.g.,  3 iterations). 

A key  assumption  of  the  PCiPS  algorithm  is  that  the  mean  number  of  photon 
pairs  recorded  by  the  ith  detector  pair  is  a weighted  sum  of  the  mean  number  of 
photon  pairs  that  would  have  been  recorded  by  a certain  set  of  detector  pairs  if 
there  were  no  errors  due  to  scatter  and  non-collinearity.  Currently,  the  chosen  set  of 
detector  pairs  is  the  projection  for  which  the  ith  detector  pair  belongs.  In  Chapter  6, 
the  assumption  was  justified  for  the  case  where  the  image  space  had  approximately 
uniform  attenuation  (e.g.,  brain).  However,  for  an  image  space  with  non-uniform 
attenuation,  another  choice  for  the  set  of  detector  pairs  may  be  more  appropriate. 
Thus,  it  would  be  worthwhile  to  revisit  the  assumptions  of  the  PCiPS  algorithm  so 
as  to  broaden  its  application. 

Increasingly,  three  dimensional  (3D)  PET  scanners  are  being  used  to  perform 
whole-body  scans.  Thus,  it  may  be  beneficial  to  the  PET  community  to  extend 
the  proposed  algorithms  to  3D.  One  practical  problem  of  3D  PET  is  that  it  takes 
an  extremely  long  time  to  reconstruct  images  because  the  number  of  data  points 
is  increased.  So,  with  3D  implementations,  one  should  consider  parallel  computing 
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techniques  such  as  a multiprocessor  approach  [70]  or  a computer  cluster  implemen- 
tation [71].  Observe  that  the  proposed  algorithms  can  be  parallelized  because  all 
the  pixel  values  are  updated  simultaneously.  By  contrast,  the  penalized  weighted 
least-squares  algorithm  in  [42]  updates  pixel  values  sequentially  so  that  the  algorithm 
cannot  be  parallelized. 

The  proposed  algorithms  were  assessed  with  real  phantom  data.  However,  a more 
in-depth  assessment  would  include  other  types  of  real  phantoms  (e.g.,  brain  phan- 
toms) and  patient  data.  Another  consideration  is  to  study  the  proposed  algorithms 
more  when  errors  due  to  detector  penetration,  non-collinearity  of  line-of-response,  and 
positron  range  are  corrected.  We  did  not  consider  those  errors  in  the  dissertation. 


APPENDIX  A 

IIUBER’S  SURROGATE  FUNCTIONS 

In  this  appendix,  we  present  Hnber’s  proof  of  the  inequality  A 0)(£)  > A (f)  for 
all  f.  From  (3.14),  recall  that  A(n)  is  defined  to  be 

A <">(f)  = A(f(n))  + A(t(n))(<-f(n))  + ^7(i(n))(^-i(n))2  (AT) 

= A (f(n))  + A (f(n))(f  - f(n))  - 7(f(n))ff(n)  + ^7(f(n>){f2  + (£(ti))2}(A.2) 

= A (f<">)  + A(f(n))(f  - f(n))  - A(f(n))f  + ^A(f(n))f{n)  + A.3) 

z z 

= A(f(n))  - \\{t[n))t[n)  + )U(f(n))f2  , (A. 4) 

z z 

where  we  used  that  7(f)  = Consider  the  function  /(f)  = A*'^(f)  — A(f)  and  its 
first  derivative 

/(f)  = [7(f(n))-7(W-  (A-5) 

From  (A. 5)  and  the  assumption  that  7(f)  is  nonincreasing  over  [0,  00)  (see  (AS6)),  it 
follows  that  /(f)  < 0 over  [0,  f^]  and  /(f)  > 0 over  [fO/  00).  These  inequalities  and 
the  fact  that  /(f(n))  = 0 imply  that  /(f)  > 0 for  f > 0,  or  equivalently  A(")(f)  > A(f) 
for  f > 0.  It  is  clear  that  A(n)(f)  > A(f)  for  f < 0 because  of  the  symmetry  of  A(n)(f) 
and  A (f)  (see  (AS3)). 
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APPENDIX  B 

SURROGATE  FUNCTIONS  FOR  PENALTY  FUNCTION 
Our  objective  in  this  appendix  is  to  demonstrate  that  the  surrogate  function  A*''f 
can  be  expressed  as  (3.22).  First,  we  make  the  following  observation:  Since  7 (t)  = 

(see  (AS6)),  g 74  in  (3.17)  can  be  written  as 
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+ 27(x<”>  - xr)xr4n' + a< 


(n)\A«)A«) 


(n) 

X)  — X 


(«)> 


(B.4) 


= 7(x  n)  - x<n)) 


, x, 


xf+x^2 


2 

(0\2 


}z 


Xfc 


x'n)+x<^2' 


+ A(x<n)  - x^,n) ) 


+ 27(^n)  - x{k))xf)x{k) 


7(^n)  - ^ln)) 

fr 

X<"»+X<">1 

2 

> + {xfc 

xP  + X? 

2 J 

2 

“ ^7(^n)  - x 

P)G" 

1 - xf>f  + A(i<">  - 

xP) 

l(x\n)  ~ x[n)) 

- 

xjn)  + xj;n) ) 

2 

> + jxfc 

xP  + x<”> 

2 J 

2 
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where  Cj?  = A(®Jn)  - x{n))  - |A(xJn)  - x^X3^  - 4? 


jn}).  Thus,  A(n>  in  (3.18)  can  be 


written  as 


(B.8) 


j= 1 keNj  v ' 

where  C<n^  = J2j=i  YlkeN  To  get  (B.8),  we  used  the  fact  that  the  function  7 

is  even  symmetric  (i.e.,  7(f)  = 7 (— t)  for  all  t)  and  ujjk  = wkj  for  all  j and  k.  We  also 
used  the  fact  that  the  jth  voxel  is  excluded  from  the  set  Nj,  and,  if  the  kfh  voxel  is 
in  Nj,  then  the  jth  voxel  is  in  Nk . 


APPENDIX  C 

STRICT  CONVEXITY  OF  PML  OBJECTIVE  FUNCTION 
We  will  prove  that  the  PML  objective  function  (I>  is  strictly  convex  over  the  set 
{x  . x > 0}.  First,  we  will  show  that  the  Hessian  of  A,  denoted  by  S,  satisfies  the 
following  properties: 

• (SI)  z'Sz  > 0 for  all  z,  where  denotes  the  transpose  operator 

• (S2)  z Sz  = 0 only  when  2 = 0 or  2 = cl  for  some  c ^ 0. 

The  (i l,m)th  element  of  S is  given  by 


[S]lm  = < 


l = m 

—2wimX(xi  - xm),  l ^ m and  m € Ni 
0,  otherwise 


(C.l) 


Using  (C.l),  it  follows  that 

j 


(=i 


m&Ni 


m^Ni 


Sz  — ^ ' Z[  ^ 2 Z[  ^ ' tC;mA(x/  Xm)  ^ ^ ^ lTn A (-L  *^m)^  (^•■2) 

(C.3) 
(C.4) 


= X!  - xm)zi{zi  - Zm) 

1=1  mE.Ni 
J 

— 2 ^ ^ ^ ^ *^m)  %m) 

1=1  mENi,m>l 


Since  wim\(xi  — xm ) > 0 for  all  l and  m (see  (AS5)),  (SI)  and  (S2)  follow  from 
(C.4). 

To  prove  that  <3?  is  strictly  convex,  it  suffices  to  show  that  zTz  > 0 for  z / 0, 
where  T is  the  Hessian  of  <F.  Note  that  T = 7 Z + (3S,  where  1Z  is  the  Hessian  of  -'F. 
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The  (l,  m)th  element  of  1Z  equals 


By  (Si)  and  the  fact  that 


[7^m  = X>r-p  \tm]2  ■ 
jri  [Px  + p}j 

(C.5) 

1 A , 3 

J 

t=l  1 JI  m—1 

( YlPuzi) 
1=1 

(C.6) 

ST'  1 ( [PZ\i  \2 
^‘'{[Px  + p],)  ’ 

(C.7) 

it  is  true  that  zTz  > 0 for  all  z.  By  (ASl)  and  (AS2),  it  follows  that 


ziiz=sydt(Jvl]- , ) 

^ V Wx  + p}J 


> 0 


1 V [Px  + p\ 

for  z = cl  when  c ^ 0.  Thus,  by  (S2),  it  is  concluded  that  zTz  > 0 for  z 7^  0 


(C.8) 


APPENDIX  D 

SOLUTION  TO  UNCONSTRAINED  OPTIMIZATION  PROBLEM 
IN  MODIFIED  APML  LINE  SEARCH 

We  first  show  in  this  appendix  that  the  surrogate  function  pO+P  in  (4.29)  is 
strictly  convex.  Then,  we  prove  that  the  solution  to  the  unconstrained  optimization 
problem  in  (4.30)  can  be  expressed  by  (4.31).  Consider  the  second  derivative  of  r(n+P; 

f<"+‘>  = J2  d"+1) + 0 x x W(4"+1>  - 4”+1,)U”+I>  - 4”+11)2  (D.1) 

i= 1 j= 1 keNj 

(observe:  f("+P  js  independent  of  a).  It  is  true  that  {x^}  is  bounded  for  all  n, 
7(f)  > 0 for  — oo  < t < oo,  and  7 is  a continuous  function  over  —00  < t < 00. 
Thus,  it  is  clear  that  the  second  term  in  the  right  hand  side  of  (D.l)  is  positive  for 
■y(«+P  ^ c\  provided  c ^ 0.  For  the  case  where  t/n+P  = cl  with  c 7^  0,  it  follows 
that  from  (4.27) 


A 


(n+l) 


c2d,(\V l],)2  „ > n 

(c[r,l]iL(n+1)+[’Px(n+1)]i+p02  ’ 

c2M[r l],)2 c<  n 

(cplliU^+V+lVX^+^i+pi)2  ’ 


(D.2) 


for  all  i.  Thus,  by  (AS1)  and  (AS2),  Yll=i  At,-n+1)  > 0 f°r  v(n+1)  = cl,  when  c^O. 
Consequently,  it  can  be  concluded  that  f(n+P  > 0 for  i/n+P  ^ 0.  This  result  implies 
that  r<n+P  is  strictly  convex  for  t/n+P  ^ 0. 

Now,  we  will  show  that  the  expression  in  (4.31)  is  the  solution  to  the  uncon- 
strained optimization  problem  in  (4.30).  First,  recall  that. 

j 

A(n+P(a?)  = ^2  ^2  ^jfcA("+1)(xj  - xk ) , (D.3) 

j= 1 fceNj 
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where 


A {n+l\t)  = A(^"+1))  + A(f(n+1))(t  - f(n+1) ) + l-i{t(n+l])(t  - f(n+1))2  . (D.4) 

For  An+0  = ahn+1>  — the  surrogate  function  f^n+1^  can  be  expressed  as 


r(n+1)(a)  = 0(n+1)(a)  + /3A(n+1)(a:(n+1)  + av(n+l) ) 


(D.5) 


= £«!”«>(„) + /3££  ujk\(xy+l)  - x[n+1))  + c 


i= 1 


j= 1 keNj 


\^(n+1)  _ ^.("+1)\r, ,(«+!)  _ „,("+!) 


+ u>jkx{xf+1)  - 4n+ 1 )K 


-vl  'at 


j= 1 keNj 

j= i tew,' 


1 / (n+1)  (n+l)\ / (n+1)  (rt+l)\ 2 _,2 


2 ' ')( 


**  ~vk 


) V , (D.6) 


where  C = X^=1{pi + log(dJ)}.  Thus,  using  the  definition  of  6)n+L>  in  (4.23),  the  first 
derivative  of  r^"+1\  denoted  by  r("+1\  is 


(n+l) 


r(n+1)(a)  = ]T{p<n+1W^"+1)(0)} 


i—  1 


+ pYIYI  ujkHxj  - 4n+1 ) W; 


) 


j= 1 keNj 
J 


+/?ee  W(*r+i)  -4n+,,)(»ri)  - 4n+,))2«  • to.?) 

j=i  fceVj 

Since  r(ra+1)  is  strictly  convex,  the  minimizer  of  r(n+0  can  be  found  by  setting  its 
first  derivative  to  zero: 


f(n+1)(a) 


= 0 . 


(D.8) 


Solving  (D.8)  results  in  the  expression  in  (4.31). 


APPENDIX  E 

CONVEXITY  OF  SURROGATE  FUNCTIONS 
FOR  OBJECTIVE  FUNCTIONS  IN  APML  LINE  SEARCH 


In  this  appendix,  our  objective  is  to  show  that  there  exists  a symmetric  positive 
definite  matrix  M,  which  is  independent  of  n,  such  that  f G+1)  > 2(u*n+1^),A/I(u(n+1^), 
where  Dn+1)  is  the  second  derivative  of  the  surrogate  function  FO+1)  in  (4.29).  Note 
that  TG+1)  is  shown  in  (D.l).  It  is  true  that  is  bounded  for  all  n,  7 (t)  > 0 

for  or  —00  < t < 00,  and  7 is  a continuous  function  over  — 00  < t < 00.  Thus,  there 
exists  a number  70  > 0 such  that  7(Xj,i+1^  — 1 1'>)  >70  for  all  j,  k,  and  n.  Hence, 

it  follows  that 

f<"+'>  > Y.  V+1)  + /37„  X X V+1)  - V+1>)2  (E  l) 

*=i  j= 1 fceiVj 

/ 

> Y.  ^+l)  + 2/370 {v^yn(v^)  , (E.2) 

i= 1 

where  the  (l,  m)th  element  of  H is  given  by 


E fcgjvjWHk,  l = m 


Mlm  = { ~ 


~^lm  ? 


0, 


l 7^  m and  m £ Ni  ■ 
otherwise 


(E.3) 


Note  that  the  matrix  H is  symmetric  because  wim  = wmi  for  all  l and  m.  Now,  we 
consider  the  term  Ei/4”+^-  First,  note  that  can  be  expressed  as  (see  (4.27)) 


(n+l) 


/(aSn+1))di([Pt;("+1)]i)2,  [Pv^+%  ± 0 

0,  [Pv(n+%  = 0 , 


(E.4) 
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where  f(t ) = ^ and 

a(„+1)  a | [Vv^\L^  + [Px^+%  + pu  [Pv^%  >0 
i [7>w(«+l)].f/(n+l)  + [Vx^+%  + ph  [Pv^n+%  < 0 

Also,  note  that  s-,!+1)  is  bounded  below  by  0 for  all  n and  i (see  (4.15)).  Moreover, 
if  we  assume  that  ?/n+1)  is  bounded  for  all  n,  it  follows  that  is  bounded  above 

for  all  n and  i.  To  see  this  fact,  we  only  need  to  consider  Z/n+1)  and  [/^n+1^  because 
cc(«+1)  is  bounded  for  all  n by  Proposition  1.  When  v'-1 1 ^ < 0 for  all  j,  it  follows 
that  l/n+1)  = — oo.  For  this  case,  the  fact  that  L^n+1^  = — oo  is  not  a problem 
because  sjn+1)  = ['Pv(?l+1)]!f7("+1)  + [' Px{-n+l',]i  + pt , where  t/(n+1)  is  a finite  number. 
Similarly,  when  > 0 for  all  j,  it  follows  that  U^n+ ^ = oo.  For  this  case, 

s-"+1)  = [Pi/n+1)].L(n+1)  + [Px^n+l\  + pi,  where  L<'n+1'1  is  a finite  number.  Thus, 
s<"+1)  is  bounded  for  all  n and  i.  Since  f(t ) > 0 for  0 < t < oo,  the  function  / is  a 
continuous  function  over  0 < t < oo,  and  sjn+1^  > 0 is  bounded  above,  there  exists  a 
number  /0  > 0 such  that  /(s-n+1))  > /0  for  all  n and  i.  Thus,  it  follows  that 

/ 

r(n+1)  > fo^di{[Vvin+%)2 + 2(3^0(vin+1)yn{v{n+1))  (E.6) 

i= 1 

> fQ(y(n+l))'V'VV{v(n+l))  + 2(3-f0(v{n+1))'H{v{n+1))  (E.7) 

> 2(u(n+1))'.M('u(Tl+1))  , (E.8) 

where  V = diag(d)  and  M = ^V'VV  + (3j0 Tfj.  Since  V'VV  and  H are  symmetric, 

the  matrix  M.  is  symmetric.  Now,  we  only  need  to  show  that  M.  is  positive  definite. 
From  the  second  term  in  the  right  hand  side  of  (E.l),  it  is  easy  to  see  that  2 Hz  > 0 
for  all  2 and  zHz  = 0 only  when  2 = 0 or  2 = cl  for  some  c ^ 0.  Moreover,  by 
(AS1)  and  (AS2),  it  follows  that 

I 2 

zV'VVz  = c2  dt  ([-PI],)  > 0 

i= 1 


(E.9) 
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for  z = cl  and  0.  Thus,  from  /0  > 0 and  /?7o  > 0,  it  can  be  concluded  that  M 
is  positive  definite. 
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